Home > Research Process, Evidence & GPS





Snapshot of the page 11 Apr 2011 Created by Adrian
Snapshot 15 May 2011
Snapshot 18 June 2011

Introduction

I have been concerned for some time that there is a gap in our analyses between the genealogical research process, the evidence & conclusion model and my aspirations for a "scientifically" robust process and write-up. This page is an attempt to fill in that gap and define more specifically what we need - in my personal opinion - to be documenting and therefore what I would like to see in my ideal genealogy research software. Ultimately though, the focus of the page is less on the process, and more on driving out what data items are involved in that process, so I can be more confident about what's needed in the BetterGEDCOM Data Model.

As with all my process descriptions, I usually don't mention whether or not software will be used at any point, because a process is about "what" is done, not "how". Where I do mention software, it's because avoiding doing so makes the words harder to read.

It may very well be that this page does no more than document what you, the reader, felt was obvious. If so fine, but I have satisfied myself that I have filled in the gap.

This process researches a specific event, attribute or relationship concerning something or someone. It does not attempt to set an overall strategic direction. As a result, someone writing a family history will go through this process many times. (Note - I have not yet tested this process in my mind against large-scale family reconstruction, but suspect the same steps appear - just more often.)

Inspiration for this process description has been taken from a variety of sources, credited in the text but especially from Mark Tucker's Genealogy Research Process diagram.

See detail on definitions

See Data Model

1. Set a focussed goal

Set and record a focussed goal (e.g. “Who were the parents of X?”) (c.f. Tom Jones, "Inferential Genealogy" course handout, 2010, Family Search)
Input
Output

2. Create or revise research plan

It is likely that you need to take several steps to reach the goal. So, split the necessary work into portions, each of which contributes a step towards the overall goal. Each portion will have its own specific objective, which is lower-level / more detailed / more specific than the overall focussed goal. Unless the plan is deliberately intended to be only a partial plan (see below), the last step should pull everything together and provide a solution to the overall focussed goal.

From the steps, with their own objectives, create a “reasonably exhaustive” research plan describing what to search for and how to analyse it. The research plan needs to be broad enough in time, space and people to trap potentially useful information (c.f. Tom Jones, also the Genealogical Proof Standard.) Do not be afraid to include speculative items, e.g. "Anything in Chester Quarter Sessions records for the 1820s?".

This plan starts as an initial plan - it is highly likely that it will be necessary to loop back here and create a revised plan later on in the light of information found - "No plan survives first contact" (Field-Marshal Helmuth von Moltke the Elder). Indeed, if you are uncertain of the direction your research may take, the initial plan may be only a partial plan, with the rest being defined only in the light of the first set of discoveries.

Input
Output

3. Carry out research

For each of the work-portions in the research-plan, carry out the research according to the current plan. (When searching for paper-based records, this step takes place in Archives, Record Offices, etc. When searching internet based records, this takes place at a computer terminal and the division between this and subsequent steps might tend to disappear). (There are personal decisions to be made about how much to record for sources that are close to the search criteria but do not match - if it's been a long journey and you won't be back for a while, it might be tempting to record "close" sources for long-term storage somehow in case they turn out to be for a relative.). Important - if a search does not find a record - make a note of that as this will save you looking for it in the same place again. Also, the lack of a record where it might reasonably be expected to be, could be significant.
Input
Output

Exception - if this work-portion does not contain any research work (e.g. because it is only analysis), then this step is not executed.

3.5 Understand the Records

For each of the work-portions, check your understanding of the records that have been judged to have useful information. Understanding why a record was created will help interpret the information in it (For instance, does the grant of probate say 'Personal Effects' and / or 'Real Estate' - do you understand the difference?). (c.f. Tom Jones, "Inferential Genealogy")

4. Select & Analyse the Evidence

For each of the work-portions, assemble the research from this work-portion. Assess the quality of the source material. Using your research plan, select the evidence (i.e. the information that's relevant to the objective of this work portion). This evidence can come from this work-portion, previous portions and the evidence already in your database. Look for any patterns, matches, differences, etc. that might be meaningful. (These are not meant to be sequential steps but can take place in parallel)
Can you demonstrate ("prove") that the person referred to in the evidence is the one that the objective needs? Analyse the evidence to see if you can answer the questions posed by the objective for this work-portion.
What are the conclusions for this work-portion? Is there any conflicting evidence? (e.g. this looks like him but it's the wrong father) Any partial progress? (e.g. These are the marriages matching our couple but there is more than 1 match, so we cannot yet tell which is their marriage). Any evidence that contradicts any hypothesis? (e.g. Implication of complete search is that they were not married after all)
If there are conclusions, record the analysis and conclusions for this work-portion in some form of proof summary or proof argument, referring back as necessary to previous conclusions.

Input
Output

Exception - if this work-portion does not contain any analysis work (e.g. because it is a portion containing only self-education), then this step is not executed.

5. Has the Objective been met for this Work-Portion?

If the objective for this work-portion has not been met, or if there is conflicting evidence that cannot be resolved, return to create new search plan with revised specific objectives (Some conflicts can be accepted if they can be resolved, i.e. explained away in a plausible manner - e.g. it was 50 years after the marriage and the son giving the information was born long after that marriage.)

5.5 Record Conclusions

For each specific objective, if it has been met with no unresolved conflicting evidence, enter any conclusions into the genealogy application, either

Exception - if this work-portion did not reach any conclusions (e.g. because it is a portion containing only self-education), then this step is not executed. If this work-portion was supposed to reach a conclusion, but for some reason could not, then review if there are any other conclusions that can be entered into the database - e.g. "We don't know his parents but we now know his occupation at date X". Ensure that such conclusions can be justified - perhaps creating a new work-portion to do so.

6. Go onto next work portion in research plan

Go onto the next work portion in the research plan.

7. Check overall goal has been met

If all the objectives from all the work portions have been completed and there is no unresolved conflicting evidence, double check to see if the overall focussed goal has been met (remember, in this way of working, the last portion is meant to provide the answer to that focussed goal) and that all the conclusions have been entered into the genealogy application, including the final conclusions. If it has, then this research process is complete.

If there is any unresolved conflicting evidence or if the overall focussed goal has not been met, return to create new search plan with revised specific objectives.

Diagrammatically

ResearchProcess-ABv3.png

Caveats

I talk of "proof" and "final conclusions". It is unlikely that proof in a complex genealogical study can reach the standard of proof required in a criminal case ("beyond a reasonable doubt") - the Genealogical Proof Standard exists to provide criteria to judge the standard of proof obtained. Nor is any conclusion really final as the appearance of previously unsuspected information may throw everything into suspicion.

Application Software

The crux of the matter is this - what do I want to see in an application? And therefore in the BetterGEDCOM Data Model? The answer is - everything that's recorded as an input or an output above. Actually, there are exceptions denoted above by the phrase "probably this data is held elsewhere" since otherwise we'd end up dumping all the text books into our applications. But after these first steps, I'd really like all that lot to go into my application.


Data Analysis

First cut list of entities - excluding those already clearly covered by GEDCOM - i.e. persons, families, etc. This list is somewhat descriptive, rather than specific.

The indented bullets are intended to imply a probable relationship - e.g. the higher level bullet consists of the lower level ones.

Research-DataModel-ABv3.png
First cut of data model shown above. Notes:

Comments?

Comments

AdrianB38 2011-04-11T08:30:20-07:00
Introduction
I have been concerned for some time that there is a gap in our analyses between the genealogical research process, the evidence & conclusion model and my aspirations for a "scientifically" robust process and write-up. This page is an attempt to fill in that gap and define more specifically what we need - in my personal opinion - to be documenting and therefore what I would like to see in my ideal genealogy research software.

It may very well be that this page does no more than document what you, the reader, felt was obvious. If so fine, but I have satisfied myself that I have filled in the gap.

You may note that several phrases deliberately echo the G Proof Std. This is, so far as I remember but then again I'm getting old and forgetful, the first time I've seen the full sequence right down to the entry of data into the application under EITHER the conclusion only or the evidence and conclusion model.

The process, especially the last bit, clearly demonstrates to me that the evidence and conclusion model does not challenge what one might term current formal genealogical processes but supplements it, and that conversely, carrying out those processes is not, of itself, enough if one wants the full science.
AdrianB38 2011-04-11T13:50:20-07:00
OK Tom - one more comment, this time around item 3.

I'm not sure I'd create the evidence records at this point. HOWEVER I suspect this has everything to do with practicalities of how good apps are at EASILY loading data. If I'm looking in the 1851 census for a Thomas Taylor, born Lancashire 1829 +/- 10y, with a wife named Mary, there are 18 of them. If I'm doing it the way I do now, the SUMMARY list off Ancestry of those 18 is just copy and pasted into a Note item in my database.

My step 4 asks if short-term objective has been met? Well, no because none of those 18 look like my Thomas. (And that's where I'm stuck with my GG grandfather - I can see him in 1841 and 1861 onwards but not 1851).

Now - I can't convince myself that there's any virtue in loading source records, never mind extracted evidence records, of those 18.

I think this means that I would wait until step 4, the analysis before entering the stuff into the app. That kind of makes some sense to me because step 3 is pretty much what happens at the record office and step 4 is thinking time.

Now, it could be that I'm not phrasing the task right because what I'm actually doing is looking for someone born at Penwortham in Lancs (yes - our census gives the town of birth). But if I do use that as the search criteria, then I get zero records (which is fine) but I've also lost the ability to say 6 months later "D'uh - he could have been born at Hutton, just outside Penwortham - let's go back and recheck"

Now, if Ancestry downloaded everything into BG format, I could load everything easy-peasy. But since it won't I'm reluctant to do the equivalent of typing up 18 evidence cards to shuffle, when I can just summarise them in a few lines and mentally shuffle.

IF my app did that shuffling for me, I'd input it. If it doesn't, I'm unconvinced that inputting the full evidence makes sense. (If it was Pickstock or Pleass, where you'd be convinced that a large number of them would turn out to be related, it would make sense to put them in. But not for Taylor.)
ttwetmore 2011-04-11T17:53:36-07:00
"OK Tom - one more comment, this time around item 3...I'm not sure I'd create the evidence records at this point. HOWEVER I suspect this has everything to do with practicalities of how good apps are at EASILY loading data. If I'm looking in the 1851 census for a Thomas Taylor, born Lancashire 1829 +/- 10y, with a wife named Mary, there are 18 of them. If I'm doing it the way I do now, the SUMMARY list off Ancestry of those 18 is just copy and pasted into a Note item in my database."

This is a dilemma. I've been doing genealogy for over 20 years now. It took a long time for me to reach the "chasm" you reach when you can no longer discover new ancestors without doing careful reesearch and gathering lots of often conflicting evidence.

My current practice is to create the evidence records whenever I find information that has "high likelyhood" of belonging to a person of interest. Faced with 18 census records for persons with the same name as the person I am researching, where it is difficult or impossible to filter out a significant number of them, I would only create records for those that I decide have any real chance of being the persons I am interested in.

"My step 4 asks if short-term objective has been met? Well, no because none of those 18 look like my Thomas. (And that's where I'm stuck with my GG grandfather - I can see him in 1841 and 1861 onwards but not 1851)...Now - I can't convince myself that there's any virtue in loading source records, never mind extracted evidence records, of those 18...I think this means that I would wait until step 4, the analysis before entering the stuff into the app. That kind of makes some sense to me because step 3 is pretty much what happens at the record office and step 4 is thinking time."

Okay, I now get your point. In step 3 you are deciding whether any of the information you have found could possibly apply to your ancestors. If you decide not you also wouldn't create evidence records for them. I agree with you. I don't either. I didn't see step 3 the same way at first. I do step 3. by looking at the a result from Ancestry.com or FamilySearch or a vital record in a city hall, rejecting it, and moving on to the next one. It's over so fast I don't think about it much.

"Now, it could be that I'm not phrasing the task right because what I'm actually doing is looking for someone born at Penwortham in Lancs (yes - our census gives the town of birth). But if I do use that as the search criteria, then I get zero records (which is fine) but I've also lost the ability to say 6 months later "D'uh - he could have been born at Hutton, just outside Penwortham - let's go back and recheck"

(Aside: I've been into the English, Welsh, and Manx censuses many times for my ancestors -- they are a pleasure to work with).

You hint at the issue of negative evidence here. This issue still eludes me. Where does that info go?

"Now, if Ancestry downloaded everything into BG format, I could load everything easy-peasy. But since it won't I'm reluctant to do the equivalent of typing up 18 evidence cards to shuffle, when I can just summarise them in a few lines and mentally shuffle."

That hits the nail on the head. If we can't get the evidence records from our services, will we ever be able to effectively do record-based work on our home applications? The future of genealogical software may rest on the answer. I don't believe Ancestry.com will open up their interfaces for this purpose, but I have hope that FamilySearch will.

"IF my app did that shuffling for me, I'd input it. If it doesn't, I'm unconvinced that inputting the full evidence makes sense. (If it was Pickstock or Pleass, where you'd be convinced that a large number of them would turn out to be related, it would make sense to put them in. But not for Taylor.)"

So if genealogical software does not support evidence-based records in any meaningful way, what's the point of having the evidence records in the first place? None. I anticipate the next generation of genealogical software that does support it and hope Better GEDCOM will be ready for it. My hope may be completely false. The trend in all software over the past 30 years has been a steady dumbing down of features in order to attract a larger base of less and less capable users. Genealogical software has suffered this trend as much as others. Why should I have any hope that a "research quality" genealogical application will ever exist? I am delusional.

Fortunately I am convinced that it is trivial to extend genealogical models to support evidence records, because of the very simple fact that evidence records are just person records and event records limited that are limited to just hold the facts that can be derived from single items of evidence. The same records we use now for our conclusions are fully adequate to hold evidence. And simply by allowing person records to refer to another set of person records as part of their justification, we get everything I am asking for. So even if the Better GEDCOM model drops back and punts on the evidence question, it will be pretty easy to extend it in the future.
gthorud 2011-04-11T19:47:54-07:00
I think we will have to create a model that will allow flexibility.

In one extreme a lot of source transcripts comes out of step 3, the user analyses them and existing data in the database, records the reasoning (maybe also a conclusion) in a research note somewhere, and goes on to create conclusion persons, relations etc. Some users will think that it is a waste of time to record evidence persons.

Or, the user records the transcripts and extracts the bits and pieces from each transcript into evidence persons, and then creates conclusion persons, recording the reasoning in the conclusion person, and creates the relations.

The real difference is the use of evidence persons (and low level conclusion persons) and where the reasoning is recorded – together with the objective or in a conclusion person. I think in practice, a mix of both will be used by ONE researcher. If there are several collaborative researchers, as on FS, it makes sense to enforce evidence persons. The choice of where to record the reasoning may depend on what you are researching, I have a difficulty in choosing between the two alternatives. How would the reasoning output in reports if it is spread all over the place (and how do I get an overview on screen of all that reasoning and evidence if it is split into many pieces)?

Also, you can’t record the reasoning about a place, group or ship fact in a conclusion person – we need a general research/reasoning/citation structure.

If we delete a person, where do we record the reasoning behind that?

I think Adrian has done an excellent job. We should try to further develop the text. Update what is done in step 3. Include negative evidence. Taking the two extrems above, could we enter into it the alternatives where we would record info in each step – from step 4 onward. Also, compare this with the Evidence and Conclusion Process – I don’t see a big difference between step 3 onwards in it, and Tom’s E&C model – I need to understand the differences if any. Could we develop names for each step, possibly not a short name in every case.

It would be useful to save dated versions of the model so it will be possible to understand the discussions afterwards - eg. if we change the step numbers.
GeneJ 2011-04-12T01:53:26-07:00
Adrian,
You wrote, "phrases deliberately echo the G Proof Std"

I practice the GPS differently, at least I think.

The element, "Reasonably exhaustive search," from the GPS contributes to credibility because it (1) "assumes examination of a wide range of high quality sources, and (2) minimizes the probability that undiscovered evidence will overturn a too hasty conclusion."
http://www.bcgcertification.org/resources/standard.html

I don't associated the "exhaustive search" process with the more narrow, initial or "short term" search.

See Tom Jones, "Inferential Genealogy," 2010
http://bit.ly/ev0YId

In the example, his "Start with a Focused Goal" is not unlike your "high level objective (his was, "Identify the parents of the Maxfield Whiting who married Lettice Johnson in 1753.").

His step 2, seems quite a bit different though--he is going to search broadly. Note in part, "When you have identified a target ancestor, you should research him or her from birth to death, and then at least a decade before and after to make sure you do not miss some pertinent information."

Do you see that as I do?

The theme continues, in a process I see as building a body of evidence. Your #2 looks a little like the first couple of bullets to his #3--Understand the Records.

His #4, Correlate the evidence--I see as a body of evidence process.

The purpose of his class was inferential genealogy. He writes, "Inferential genealogy is one method of kinship determination."

I appreciate you deal with different record groups and circumstance, and that this process that is comfortable to _you_, how you think and how you like to work.

I'm hoping to see something concrete that deals with a more substantial body of evidence over time, reflecting a good diversity in record types.
AdrianB38 2011-04-12T08:55:42-07:00
Tom, I think I agree with absolutely everything you wrote. In particular, I'm glad that you'd only "create the evidence records whenever I find information that has "high likelihood" of belonging to a person of interest."

Re negative evidence - err - not sure on this yet. I think _negative_ evidence appears only at stage 4, when I'm back "home" and I analyse what I've found - only then am I in a position to say "She ain't there..." (Of course, in reality, it's often actually a sinking feeling in the Record Office as you run off the end of the microfilm and you still haven't found her).
AdrianB38 2011-04-12T09:09:47-07:00
Geir
I think you're right - there needs to be flexibility in step 3.

If I'm in a record office, I'll be making full transcripts or extensive summaries of all records that meet the search criteria (depending on the size of the document and the likelihood of the person being of interest either now or later).

If I'm searching Ancestry or FS it'll be a copy and paste of the summary of the records found and I'll _save_ the image or copy the text for a much smaller range because I know I can return to them later.

"If we delete a person, where do we record the reasoning behind that?" I think that would depend on where the person came from and why. There's at least an argument for leaving said person around with big warnings - e.g. the Robert Bruce who fought at the Battle of Hastings is very possibly a myth - at least, there's no real evidence for him. So maybe I might leave him in my database but with the warning that no, he almost certainly doesn't exist. Whatever seems appropriate - otherwise I think the deletion gets recorded in step 8.

"compare this with the Evidence and Conclusion Process" (or E&C Model) - I don't think there will be. Certainly not intentionally.
AdrianB38 2011-04-12T09:11:14-07:00
D'uh - missed a bit...

"compare this with the Evidence and Conclusion Process" (or E&C Model) - I don't think there will be MANY DIFFERENCES. Certainly not intentionally.
AdrianB38 2011-04-12T09:33:09-07:00
Gene
"I practice the GPS differently" - yes, this is almost inevitable. I'm sure you're a lot more familiar with it than I am - I think I'm still feeling my way towards it.

""Start with a Focused Goal" is not unlike your "high level objective"
Actually, that's probably a better phrase altogether - "high level objective" can mean anything and in fact the example was trying to show it's a fairly specific objective I was investigating.

The Tom Jones handout is very interesting - in truth, it's possibly a lot closer to what I'm doing with the more distant relatives than what I've written.

_Possibly_ my process could be used to cover an inferential investigation but the examples would look very different. E.g. "search plan" might contain "Look for all family X records in Y-shire in the years...."

I think his Step 4 "Correlate the Evidence" then matches multiple iterations of my 4 onwards. Maybe. Certainly, phrases like "Look for patterns and parallels" cover many, many things in practice. Patterns need to be interpreted and tested, for instance.
ttwetmore 2011-04-12T12:15:16-07:00
There is a lot of talk about supporting GPS, but most of the thinking in the area seems to concentrate on citing sources.

When I talk about the evidence and conclusion process, when I talk about multi-tiered models, when I talk about record-based genealogy, when I talk about evidence person records, when I talk about crossing the chasm, I am talking about the paradigm shift in genealogical software required to ACTUALLY PROVIDE REAL EXPERT LEVEL COMPUTER support for the GPS. Supporting GPS was the purpose of the DeadEnds model. Everything I continuously blabber on about is pointed straight in the direction of trying to find the best way to support GPS in computer systems.

What I find most ironic about Better GEDCOM right now is that the key opposition to this paradigm shift comes from a few BG members who most profess the need to support GPS! They are unable to imagine what it might mean to provide real expert level COMPUTER SUPPORT for the GPS process, continuing to believe the current generation of conclusion-based software systems are all that are needed. Crazy, man.
gthorud 2011-04-13T14:24:10-07:00
GeneJ wrote: "I'm hoping to see something concrete that deals with a more substantial body of evidence over time, reflecting a good diversity in record types. "

I think it would be useful to start building that example independent of the technical solution. It would be ONE stress test for Tom's model which I think is important.



It is interesting to note that a lot of the steps on the page would be supported by Administrative requirements already included in the Requirements Catalog.
GeneJ 2011-04-15T10:53:16-07:00
Body of Evidence:
If I can think of a way to do it, we might be able to use little Hannah Preston's family.
We'd have 14 children (Hannah and her siblings) from two marriages. Aside from the Hannah proof, there are two other proofs. In addition, one of the spouses can't be proven to her parents and siblings, but there is enough circumstantial evidence that one needs to report that other family as "loosely linked" (for which I use a "who's this tag"). It's possible all the documents are in the public domain but for one. I'll look more closely at that.

Admin/Requirements Catalog: I am looking forward to hearing from Adrian on his thoughts.
GeneJ 2011-04-16T06:58:10-07:00
Geir wrote, "conflict between the “all in a citation” and the “research argument” on one side, and the E&C model on the other side. ...Adrian proposes an Analysis structure that will be used by the researcher during the process that tries to “answer” an objective, and how that structure is linked to other records."
From forth posting above, I understand Adrian didn't record his objective in the software, but Admin Research (research log) would have enabled same see the example at
re: Administration01 - Research Administration Information
theKiwi Mar 8, 2011 5:13 am
http://bettergedcom.wikispaces.com/message/view/Better+GEDCOM+Requirements+Catalog/35440146

I see that step as Tom Jones', "Inferential Genealogy," 2010 http://bit.ly/ev0YId "Start with a Focused Goal" (his was, "Identify the parents of the Maxfield Whiting who married Lettice Johnson in 1753").

I search broadly (see Inferential Genealogy). FamilyHistory Library Catalog, FamilyHistory101 (see the map or listing of States; for any given state, see the list of counties), local libraries and archives, etc. are part of the process to identify available record groups--this collective is the way I begin to define "exhaustive search."

https://www.familysearch.org/#form=catalog
http://www.familyhistory101.com/

I work with one record group at a time. How my notes look would depend on the circumstance of the record group. (Deed indexes appear differently than Vital Record indexes and probate indexes; library archive catalogs appear differently than newspaper indexes--other than maybe grantor/grantee deed indexes, rarely would I commingle the findings much less create data-base proper records about them.)

If my software had Research Administration features, I'd use that to summarize information about the record group and note which records I'd want to look at further (if any). (Adrian's "SUMMARY list off Ancestry of those 18.") Also, for any record I looked at more closely. (Adrian wrote, " .. none of those 18 look like my Thomas.")

I wouldn't load any of the records into the database proper.

If of those 18 items, say three were interesting, I'd continue the research about those three entries, possibly in another research log entry.--So I'd have devised another set of methods by which I intended to evaluate and determine if either of the three records could be proven to be that of Thomas ... Those methods would involve examining the records themselves and examining yet other records (other records being both those from an existing body of evidence and those I might have to go track down).

As a rule of thumb, I don't want to create two people in my data base who "might be the same person." In many respects, I just see that practice as another form of mis-identification.
GeneJ 2011-04-11T09:52:00-07:00
This is a great posting. I am so glad to see the term "science" on the table.

From something I write last week, but didn't post--"Where do each of us draw the line between the discipline of genealogy and science? Genealogical research and say, market research? How do we differ in our view of genealogy being unique within the larger discipline of history?

In genealogy, we interpret "evidence" and draw conclusions. In a strict sense, do you see that, however faint, those interpretations are influences or a "changes."

The commingling of evidence and conclusions, as the spirit of the model, as described to me when this project started, challenges how I see genealogy as a discipline.

Even the use of the term evidence bothers me. Here's the original comment from what I think was proposed as the "Evidence and Conclusion Process" page.

http://bettergedcom.wikispaces.com/page/diff/Evidence+and+Conclusion+Process/179965381

I'll post separately about the different steps if it would help, but note the comment, "The terms evidence and conclusion are used loosely here, since even the original event and person records may involve some level of inference as the researcher creates the event and person records from the evidence."

Meeting time. Perhaps more later.
ttwetmore 2011-04-11T11:09:08-07:00
Adrian,

Here's how I see a genealogical computer application would supporting your process, numbered in your sequence.

1. This is the goal setting phase. The results are goal records in your database.

2. You create a list of action/todo items you believe will take you to your goal. These items are also in your database and refer to their goals.

3. You record the repositories visited and the sources inspected as records in your database; records that have sufficient structure that citations can be generated when needed. The sources can be referenced by the action items and vice versa. Your step 3 goes even further, however, in my methodology, to the creation of the first level of "evidence" records, that is, person and event records that contain just the information that can be extracted from items in the sources. These evidence records refer back to the sources with the additional info such as page numbers, so that complete citations can be generated for them.

Note that what I call evidence records are your "contents of the researched sources" put into textual, digital form and stored in the database as records. There is misunderstanding revolving around this idea, and there is very strong objection to this idea from some Better GEDCOM folk who think there is no point in creating these records. (Since the current genealogical industry now consists of applications and services, e.g., FamilySearch, Ancestry.com, who have billions of these evidence person records on file for us to search and use, the lack of imagination behind such obstinacy is hard to understand).

I'll emphasize this once more. Evidence records are digital records in our databases that hold only and exactly the information about persons (and events) that are mentioned in items of information we find in the sources. Some Better GEDCOM folk don't believe these records are necessary. Others, like me, are convinced that they are required for the future of Better GEDCOM. Those who don't want them lack the understanding of how valuable they are, how important they are in supporting the research process, and how important they will be as genealogy continues the current accelerating trend to be more and more records based. These evidence records are the bed rock concept behind records-based genealogy; they are the computer incarnation of this records.

4. Determine if the evidence gathered in step 3. is sufficient to say you have reached your goals. Your computer application should support this by allowing you to group your evidence records (as computer records) into groups that allow you to confidently say you understand the persons whom you were researching. A good computer application would allow you to see all your evidence, group it into conclusion persons, document your conclusions, and so on. Persons who disagree with this believe that all this should be done on paper and in ancillary log files.

5., 6., 7. Iterate on the above until you feel you are justified in making your final conclusions about reaching the goal.

8. And here you show the bias I am concerned about, as this is the first time you mention the computer application. For me the computer application starts at steps 1 and supports them all.
AdrianB38 2011-04-11T13:14:51-07:00
Tom - just a quick reply re #8:

Don't worry: I agree with your view of where the application should go in at... c.f. "what do I want to see in an application? ... everything that's recorded as an input or an output above"

The fact that I don't mention the application in the earlier steps is simply because I'm writing out the process, which stands independently of whether we're using an app or not. That's how I've always written process descriptions - until one gets to the point where avoiding the use of the word "computer" leads to diminishing returns because no-one can understand what the heck you're on about... Which point was reached at number 8. Actually - I feel guilty that point 8 is so brief compared to the others but I really didn't want to rewrite the evidence / conclusion model _again_
ttwetmore 2011-04-11T13:24:01-07:00
Adrian,

Great! Sorry for my misinterpretation.

TW
AdrianB38 2011-04-13T14:04:19-07:00
The Missing Link - a new entity type or a new type of source?
I think I was so pleased to get my head round a process for research (not sure yet if it can represent any typical process for research) that I completely forgot a topic that I have been typing about for ages until this morning when I had a "D'uh" moment...

The process on this page ("Research Process, Evidence & GPS") mentions various outputs created along the way that would all be entered into my dream genealogy application - in particular, there are various results of analyses, interim and final conclusions (possibly several iterations of interim conclusions), etc, etc.

"Then" the procedure enters the events and attributes for the people (or groups or whatever). (I put "Then" in quotes because this might well have happened - partially - earlier on)

I didn't mention that the events and attributes would point back to the relevant source records (i.e. enter the "citations" and please let's not debate the "correct" use of that word here!) because that was a statement of the obvious (or that's my excuse)

But what I also didn't do was link the various results of analyses, interim and final conclusions, etc, etc, to anything. It's just sitting there in my database! So one might have a fact that the father of John Doe, born Springfield 1795, is X, with appropriate source records linked / "cited" - but no word of the umpteen paragraphs of argument, logic, conclusions, discards, etc, that justify interpreting those records thus. Bit of a missing point that.

In fact, if you think about it, if I used an article from a learned genealogy society "Who were the parents of X?" as my source, I'd be better off because the events and attributes would point to the article and all the logic would be in there. Whereas if I've done the primary research, the events and attributes would, under the current GEDCOM inspired structure, ONLY point to the sources and NOT to the logic explaining how to combine the structure.

I would therefore propose that we need to create a NEW, TOP-LEVEL ENTITY TYPE ("Analysis"?) in BetterGEDCOM to contain the various results of analyses, interim and final conclusions, etc, etc, and that these entities get linked to facts, attributes, relationships, other entities exactly like source records get linked via "citations"

Except, except... If it's "exactly like", then why not make it the same?

So - an ALTERNATIVE to a new entity type is to simply create a Source entity / source record to contain (singly) each of our various results of analyses, interim and final conclusions, etc, etc, and link these with citation type links. The (new type of) source record would contain in its "Text from Source", not a transcript of a document, but the text of the logic. This avoids creating yet another new entity type. In addition, it seems to me that this is vaguely reminiscent of the sourcing strategy that Tom was proposing for his conclusion record, which pointed - IF I remembered and understood correctly - simply to the evidence records below, rather than to the source records. This Tom-type source would be supplemented(?) by this new source record for the logic.

I haven't yet rigorously identified the sort of stuff to go into this new type of source or the new entity - there's possibly a bit too much "etc" going on here.
mmartineau 2011-04-16T11:14:44-07:00
On Friday 8:26am (Mountain Time), Tom presented a concrete solution that elegantly solves the problem Adrian introduced. Does anyone have a reason why this model does not solve the problem? I would like to see more concrete examples like this because it helps me better understand EXACTLY what the person is trying to say. Otherwise, as others have previously stated, it's easy to misunderstand what they mean.
AdrianB38 2011-04-16T11:51:24-07:00
Gene
I wrote, "understand Gene's description of how she worked ... putting a huge amount more into the "citation" .... it works for you and I suspect it does so because you never hit the "Print Family History Report"

And you replied "O.O. I want the logic and reasoning in the citation of my working file because I create family group sheets and narratives. Hope this doesn't take us back to square one. Will you let me know?"

I'm now worried because you're worried. Obviously you create family group sheets and narratives. The question is - how? Do you press the button in your FH program that says "Print Family Group Sheet"? Or do you write them up in a word processor, perhaps pasting big chunks from your database or software produced report?

If you use the facility in your software to produce those Family Group Sheets, how do you deal with the citations? Because if you've put several hundred words of proof into the citation item in your software - as I believe you do - what stops you getting footnotes or bibliographies full of several hundred words, since surely your software automatically generates the reports with the citations in (as bibliography or footnotes, I suppose...)?
GeneJ 2011-04-16T12:22:17-07:00
Hi Adrian:

You wrote, "Obviously you create family group sheets and narratives. The question is - how?"

The short answer is, from my software.

I don't do any editing for the FGS.

In my main project, to the extent I have to edit in a word processor, it's only because the developer took a creative approach to interpreting Register or Quarterly style. If I had that project in GenBox, I might not have to edit a single word. --GJ
GeneJ 2011-04-16T12:48:32-07:00
Adrian wrote, "If you use the facility in your software to produce those Family Group Sheets, how do you deal with the citations? Because if you've put several hundred words of proof into the citation item in your software - as I believe you do - what stops you getting footnotes or bibliographies full of several hundred words, since surely your software automatically generates the reports with the citations in (as bibliography or footnotes, I suppose...)?"

A few points.

(1) I do both end notes and a bibliography for my family group sheets.

(2) Length is not a primary concern to me. If I have 150 sources on a family, well, I have 150 sources.

(3) Needless length is a concern--well written citations almost always take less space than hastily composed citations or disjointed citations. Great researchers tend to have fewer unresolved conflicts, too.

(4) Take off on 3. I find that as you approach an exhaustive search, so many conflicts are resolved or mitigated--what starts out as a long drawn out complex proof often ends up a simple statement.

(5) Take off on 4. The proof requirements that are left are important ones. If that story can't be told well in a tag, it should be already written in a word processor, posted to a blog or maybe even published.

(6) There are techniques, too. If you have a source for the indexed marriage entry, then you obtain the marriage record, do you print out both citations? Maybe you drop the QUAY on the indexed entry citation to zero and then only include citations with a QUAY greater than zero.

All things considered, though, if you are sending a family group sheet to say a historical society, you hope that record will be there long, long after you are gone.

Does this help?
GeneJ 2011-04-16T13:13:12-07:00
@Mike

The group hasn't talked about this, but there is more going on here that meets the eye ... at least I think so.

Geir lives in Norway, where they come at genealogy a little differently. I might not say this the right way, but they have a little more scientific approach (right down to their citation style).

Adrian has a science background and, like Geir, lives in a part of the world where they have reasonably stable and consistent record groups that go way back in time.

We live in a country with states barely 50 years old. Again, I might not say this right, but other than the New England town records (which vary greatly between some towns), lots of states here down have reasonably stable and consistent vital records that go back 100 years. Think of weddings in Pennsylvania--maybe the minister came around once a month and just maybe his little black book is extant. Or not.

Add the complexity of our melting pot. The Scotsman with a heavy accent conveying information to a German town assessor can make for pretty fun reading 100 years later.

I think the work Geir and Adrian are doing is great. There is only one copy of Evidence Explained in Norway--guess who has it?

I'm real appreciative to both for the time they take to try to understand the needs in another part of the world. I only with I could take more time and answer their questions with more care.

My 2 cents. --GJ
GeneJ 2011-04-16T13:15:50-07:00
Oops "states here DON'T have reasonably stable and consistent vital records that go back 100 years
AdrianB38 2011-04-16T14:27:06-07:00
Gene - so I think my understanding of the way that you work is:
a) You could have citations in your program's database several hundred words long, if it needs a tricky proof.

b) You print your final documents straight out of your program - plus or minus the odd tweak in some cases to create the correct styling.

c) Therefore, potentially you have bibliography or footnote citations that are several hundred words long.

d) Having said that, if such a proof gets unwieldy to fit into those citations, the option exists to create it as a separate document (which I presume then gets referred to with its own citation).

e) You could have 150 sources for one family. I don't think that worries me - I expected that the number of citations could be high - it was the possible length of a single citation that "did not compute" in my understanding, leading me to fear I was misunderstanding the way you work.

So by using (d), your documents probably don't get that bad for the length of a single citation.

And many citations will never need a lengthy "proof" anyway.
GeneJ 2011-04-16T14:50:42-07:00
@Adrian,

Yes!!

Did you ever get the chance to really see page 1? Here's the outline I had for the proof argument comments there:

(a) Sometimes logic can be adequately conveyed in a rather simple statement
(b) [There are also] Complex proof statement or Nature of Proof <<[Shorter ones in tag; longer to MS Word].
(c) Published Proof Arguments [more valuable when published, including published to Internet].
(d) Case Studies (proof arguments on the BCG Work Samples page) [The case studies are excellent examples of proof summaries that are also intentionally instructional]

I put a lot of work into that response. Let me know if you didn't get a chance to see it. --GJ
gthorud 2011-04-16T16:21:54-07:00
Gene,

It is nice to hear that the way we record proof here is ”scientific” – I guess you are referring to the style which may be more towards a scientific style, but to me what you do in citations is a real science to me.

The main difference is that here we like to have more of the reasoning and source summaries/excerpts in inline text rather than in footnotes. Footnotes are primarily used to reference where in, and which source, but we also see reasoning/summary/extraction in footnotes together with a reference to the source or to a bibliography. Some footnotes does not contain a reference to a source at all, it may simply be a less important comment. I like this style with more inline reasoning because it is easier to read than having to read a footnote perhaps for each sentence, and it also makes the text looking less like a list of events with dates and places. Bibliographies saves paper. And you often see acronyms used for repositories, with a list of repositories at the end of the document.

We also tend to use less citations. For example, if I know the date when someone was born, and the parish, it takes me one minute to get that on my screen (scanned), and since there is only one archive that holds parish records for that parish (8 such regional archives in Norway), I would not include a reference to that church record at all– UNLESS there is something abnormal, e.g. the confirmation record giving a different date. The same applies to censuses, land records and probates – so a lot of citations are simply not recorded.

Also, and this is my personal style, I tend to use more free text and less text generated by events, since I – and especially my readers – don’t like robot language.

So if I were to build a person over time, creating an evidence-conclusion tree structure for a person, I would record less info as events, and I would start creating my “story” about the person in a note (with reasoning) possibly when I create the first evidence person, and the story would grow as I add to the tree (or even without adding to the tree). Thus, I will not rely only on the mechanics of merging evidence/conclusion persons into conclusion persons, but I will also have a note (possibly several depending on the capabilities of the program) with a story, with reasoning and references to footnotes (citation or just a comment) that would have to creep up the tree, and growing as it creeps up. I would therefore like the program to copy, and possibly merge, such notes from the underlying persons when I create a new conclusion person. I would then add to it, or do a big overhaul if two notes were merged.

In addition to the story to be published, I would also like to have a separate, parallel dimension, where I could record reasoning and whatever that I do not intend to publish – I have previously called these “research notes”, but they could be called whatever.

Also, when considering the few events that I record, I would like to be able to include reasoning (and “research notes”) inline in the sentence produced for that event. So, depending an where a normal note text, containing fact data, for the event would end up in the sentence (it does not always end up at the end), I would like to have two additional types of notes, one with reasoning etc. and one “research note” with reasoning and other stuff not intended for publication. The notes would have to be labeled so the importing program can distinguish between them. There are already programs out there with multi part notes, although not implemented very user friendly.
GeneJ 2011-04-17T01:14:18-07:00
I knew you could put it just right! TY
" ... if I know the date when someone was born, and the parish, it takes me one minute to get that on my screen (scanned), and since there is only one archive that holds parish records for that parish (8 such regional archives in Norway), I would not include a reference to that church record at all– UNLESS there is something abnormal, e.g. the confirmation record giving a different date. The same applies to censuses, land records and probates – so a lot of citations are simply not recorded."

If my dad and mom were alive, he'd wink at her and say, "Those Norwegians always did get it right."

It was reading about _Norsk slektshistorisk tidsskrift_ (Norwegian journal) a while back. where I noticed the "scientific" reference. :)

http://bit.ly/eHNkTk
"...in line with an amendment the association did in 2001. ... Norwegian Family History Journal will maintain high academic standards. Norwegian Family History Journal is a scientific journal."

http://bit.ly/eHNkTk
http://bit.ly/dRqJGU

The NGS Quarterly, here, publishes more articles than genealogies. As I recall, the Quarterly dedicates on issue per year to genealogies. Other issues are articles and case studies.

:)

I bet your writings are top notch, Geir. -GJ
AdrianB38 2011-04-17T02:57:19-07:00
Gene - so glad I now understand the way you work. I did see your page 1 (despite the best endeavours of Wikispaces and Firefox to render everything on one line) but my interpretations of what I thought was practical were getting in the way of a full understanding. I think your variety of possible proofs is something I need to convey in the process on the main page here.

Geir - I was also interested in your description of how you work. I can relate to a lot of what you say - I do slightly similar things in terms of what I record for "citations" as I find a lot of the on-line sources over-describe - I have no idea what use the NARA microfilm roll number is to anyone on Ancestry, e.g.

But in summary it is even more clear to me that all BG can do is try and come up with a data model that allows the multitudes to do what they want and does NOT enforce linkages or entities that are derived solely from methodologies (as distinct from real life)
testuser42 2011-04-17T04:02:25-07:00
META About the unreadable first page - the Opera browser has a "fit to width" feature (CTRL-F11) that breaks these boxes... but for the sake of others, please add linebreaks in "code" boxes.
AdrianB38 2011-04-15T13:45:41-07:00
"How do Adrian and Geir feel about this?"
I've only just had chance to look at this sequence of posts, having been out most of the day, and I still haven't got my head round them all yet.

I started this page because I was concerned that there didn't seem to be any page yet that drew together research notes, source-records, "citations", and entering the resulting individual's / family's / place's event and attribute data. Was _I_ missing anything in my understanding of what was going on?

For a while I was concerned that I was, because I couldn't understand Gene's description of how she worked. Now, I believe I do - you're simply putting a huge amount more into the "citation" than I ever thought people would. Clearly it works for you and I suspect it does so because you never hit the "Print Family History Report" in your application to get one of those turgid, computer generated reports.

I'm still looking to firm up in my mind how to connect research notes, source-records, "citations", proof argument / proof summaries and the resulting event, attribute and relationship data.

In particular, I want to be able to track backwards from a "final" fact through all the steps that helped me put the stuff together. That means not just what were the sources, and where within the sources did I find the data, but what was the logic I used to show that X's parents were A and B? What did I look at? Right now, I still don't see how I'm going to be linking back to the proof summary / argument / whatever, except that, because linking logic already exist in that direction, I'm hugely tempted to say that there's a source-record in there that is created solely to point to the proof summary / argument / whatever and doesn't represent any physical thing outside in the real world. And I'm not looking forward to the arguments I might get about that concept!

But to pick up on something that Gene alluded to, my data model has to be as process independent as possible - but I can only tease these concepts out in _my_ mind by going through a process.

Err... not sure how much help that is except to say that I'm still here.
AdrianB38 2011-04-15T13:48:56-07:00
PS - I can read page 2 - page 1's layout goes off the page... Groan.
theKiwi 2011-04-15T14:18:23-07:00
Adrian said

"PS - I can read page 2 - page 1's layout goes off the page... Groan"

Same for me with Firefox and Safari on Mac OS X - the gray boxes with the example text in them don't wrap, and so the width of the whole page gets set by the length of the longest of those gray box lines, making reading not so easy :-(
gthorud 2011-04-15T15:49:41-07:00
I think it might be useful to try to get back to the issue that Adrian started out with, but I will try to sum up my view of some of the things that have been discussed above.

From the discussion we know that some think there is a need to store a lot of information in a citation (eg. footnote), identifying the source, where in source, the source’s location, an assessment of it’s quality, extracts from the source, a summary of the relevant content (Proof summary) and there may be reasoning about the evidence in the source. So everything is in a citation, possible split in a footnote and a bibliography. A citation may be related to persons, events, groups, ships, places etc.

Then there are those that want to have more of the reasoning, summary and extract as inline text (not in a footnote) in a “proof argument”, possibly a large chunk of text, that can reference many sources (through citations) and other data stored in the database, possibly not DIRECTLY related to the person in question. I am not sure if there has been any statement about where this info would appear in a report, but I could imagine as part of the text produced by an “event sentence” or separate paragraphs in the biography of a person, group, place or ship. Note that a proof argument must be able to reference something that will generate a citation (not necessarily recorded in an evidence person), and that citation could also have a piece of reasoning/summary in it, so there is most likely a need for a citation record type (which must be more general than an evidence person).

Some of the reasoning, summary and extracts may be for “internal use” by the researcher, some may be for publication in a report, possibly requiring different structures. This is important to note.

Then there is the E&C model that wants to store the reasoning in a number of notes for each conclusion person (why are two evidence persons the same), possibly splitting the reasoning in several chunks. It is not clear to me if such a note can reference sources directly or if it has to be indirectly through an evidence person. The model also records the evidence in a structured way, e.g. in event structures, one per person mentioned somewhere in a source. Programs may produce some useful hints to the researcher based on these records, but the main reason I see for having them is that you can cut and paste evidence and conclusion persons into another person – and of course the searching capabilities. Also, the E&C model must be extended to other record types than persons, and it must be able to base conclusions on, and reference, information that is not recorded in the tree of E/C-persons for that person – and conclusions are not only about why you combine two persons into one.

From the discussion there seems to be a conflict between the “all in a citation” and the “research argument” on one side, and the E&C model on the other side. In practice I think a user should be allowed to choose among the alternatives, and will probably use all of them at different times.

BUT, THE QUESTION IS IF THERE REALLY WOULD BE CONCRETE INTERNAL CONFLICTS WITHIN A MODEL THAT TRIES TO SATISFY ALL THESE ALTERNATIVES? IF SO, WHAT CAN BE DONE TO SOLVE THOSE CONFLICTS.

Then, if we try to go back to the start of the discussion, Adrian proposes an Analysis structure that will be used by the researcher during the process that tries to “answer” an objective, and how that structure is linked to other records.

It might be that the data fields in this analysis record could be something that you see only in the user interface (a form) so that most of the fields are actually fields in other records (citation elements, proof arguments, notes in the E&C structure and more), and the analysis record as stored is to a large extent populated by links to the other records. Perhaps the record is not an Analysis record, but bat part of an Objective and Task record. (This is just an idea that I have not tested.)

I think we should try to establish a model that includes records from the E&C model, Citations, Administration and Adrians process "model", and it will have to try to satisfy all requirements state3d above, and it must be extended beyond persons. (I have some more ideas, but would like to think more about it before publishing anything.)

Finally, if we create a model that can do “everything”, it must be tested with real world examples. It should build the info in the various records, step by step, and should be complex enough to involve reasoning based on many types of sources, related to several persons and other data in the database.
gthorud 2011-04-15T15:51:42-07:00
My last posting was made without reading posting 21 onwards. There is a problem with my browser, so I did not see them. I will start reading.
gthorud 2011-04-15T18:03:35-07:00
I have read the rest of the postings. My problem with not seeing #21 onwards was due to the missing linebreaks on the first page.

I have not really learned much from these postings. I have earlier rejected the idea that a researcher should be modeled as a source, and I don't think it is a good idea to model reasoning as sources. Things should be called what they are, not something else.

I think we have to go beyond the E&C model to do what we want, we should not try to fit everything into that model. The model will play a role in this, but it will not solve everything - it's primary purpose is as a tool to merge personas, evidence or conclusion persons, not to record reasoning about everything you will have to reason about when doing genealogy - for example parent/child relationships. It is simply does not cover everything.
gthorud 2011-04-15T18:29:38-07:00
A detail. I think it would be useful to be able to have eg. two different "reasoning texts" refer to the same set of "where in source" and "extract" (which in turn refer to the source+repo). A source summary could be together with the "where" or together with the "reasoning" - you could summarise different things depending on what the reasoning is about.
GeneJ 2011-04-16T00:21:17-07:00
Seems there are a lot of different topics in this single thread

Adrian wrote, "understand Gene's description of how she worked ... putting a huge amount more into the "citation" .... it works for you and I suspect it does so because you never hit the "Print Family History Report"
O.o. I want the logic and reasoning in the citation of my working file because I create family group sheets and narratives. Hope this doesn't take us back to square one. Will you let me know?

Geir wrote, "conflict between the “all in a citation” and the “research argument” on one side, and the E&C model on the other side. ...Adrian proposes an Analysis structure that will be used by the researcher during the process that tries to “answer” an objective, and how that structure is linked to other records."
From forth posting in the discussion "Introduction," I understand Adrian didn't record his objective in the software, but Admin Research (research log), would have enabled same see the example at
re: Administration01 - Research Administration Information
theKiwi Mar 8, 2011 5:13 am
http://bettergedcom.wikispaces.com/message/view/Better+GEDCOM+Requirements+Catalog/35440146
Believe I'll cross post to the other thread and continue discussion there.
ttwetmore 2011-04-16T00:53:18-07:00
Geir,

Needless to say I disagree with you on both points (1--reasearcher is not a source; and 2--E&C can't handle it all).

The model as I have discussed it has a place for every person/family/group/event/relationship concept, a place for every repository/source/evidence concept, and a place for every research/conclusion/proof concept.

Clearly it is the researcher who makes his/her decisions, based on whatever reasoning he/she is capable of, and those decisions lead to the definition of conclusion persons and other conclusion objects from constituent facts established by the evidence. I believe Mike's example made the researcher the source, not the actual reasoning. The relationship to consider is that reasoning is to researcher as evidence is to source. If you would prefer to call reasoning reasoning and researcher researcher, that is fine, words are just words, but as far as modeling is concerned conclusion entities are justified by researchers and reasoning just as evidence entities are justified by sources and evidence. If the researcher is not the source of the conclusions he/she makes, then, pray tell, what is?

You say that we must go beyond E&C and that it can't handle everything. You say this:

"it's primary purpose is as a tool to merge personas, evidence or conclusion persons, not to record reasoning about everything you will have to reason about when doing genealogy - for example parent/child relationships. It is simply does not cover everything."

That is wrong. The purpose of E&C is to enable the modeling of everything required to do genealogical research, from the information itself, to where the information came from, and to all conclusions derived from the information. It fully includes persons, families, events and relationships, and the processes you must perform to accomplish your research and to make your decisions. To say its purpose is just to merge personas indicates lack of understanding.

I disagree with your comment that E&C can't handle parent/child relationships. We've been concentrating on evidence persons and conclusion persons on this thread, but if you review all that has been said about this model you will see that it covers events and relationships just as well. The evidence for parent/child relationships is established by many forms of evidence, e.g., birth certificates, census records, ship arrival records, obituaries, etc. The E&C model includes evidence event records, and one of the main purposes of these records is to provide the evidence for all types of relationships, including parent/child as just one example. Evidence event records can be used to build conclusion event records in a manner analogous to the person process, which leads directly to parent/child relationships being part of the conclusions. The DeadEnds model, as an example E&C model, supports relationships between persons in cases where there is no direct evidence of the event that created the relationship. For example, if you have evidence that simply states that John was the father of James (nothing about the actual birth event), the E&C model would then have you create two evidence persons for John and James and link those two records together with a parent/child relationship. (Be sure you get that -- a SINGLE item of evidence has been used to extract TWO evidence person records that have a father/son RELATIONSHIP established between them.) So the parent/child relationship is supported by the model and is built into the evidence from the ground up. As you build up the conclusion persons for John and James, you can maintain their relationship into the conclusion world.

You are falling into the trap of believing that the simplicity of the examples being used to demonstrate basic concepts indicates underlying limitations in the model. We have been using examples where evidence only provides information about a SINGLE person. This is just the tip of the iceberg. Just think of a birth certificate. That's a piece of evidence that defines THREE evidence persons, ONE evidence event and THREE different relationships. These events and relationships are all covered by the E&C model just as well as it covers the persons.

Tom Wetmore
gthorud 2011-04-16T07:13:05-07:00
Tom,

I should have used the terms Evidence and Conclusion Persons, rather than the E&C Model, because that is what has been discussed above.

When you want to have the reasearcher as a source, what is the purpose? Do you, for every conclusion or reasoning the reasearcher does, to have a footnote that says that the researcher is the source of this?
gthorud 2011-04-16T07:34:12-07:00
GeneJ,

From the 4'th posting in Introduction I read that Adrian AT THAT STAGE wanted to discuss the PROCESS independent of how it is reflected in data. When he started this topic I think he crossed the line into the data dimention by introducing an Analysis record.
ttwetmore 2011-04-16T11:03:18-07:00
Gier,

Every genealogical data record should be justified so future genealogists can understand where it came from, whether it's an evidence record or whether it's a conclusion record. When a researcher decides that the persons mentioned in two different source records are the same real person and then joins those records (however it is done), that decision should be justified.

There's nothing in this idea that requires that every mention of a conclusion person generate a footnote.

I assume that any genealogical application supporting this process would add a default justification for each join action, so there would be nothing a user need do when building conclusion trees, unless he/she wants to add a statement to explain the decision.

When you look at a typical GEDCOM record today it is made up of many facts (name, birth event/s, death event/s, etc, etc). Typically these facts may comes from a variety of sources so there may be many level 2 SOUR lines in the record to state where the facts came from. This is fine, but where in the GEDCOM record is there any overall justification of the integrity of the record as a whole? Just a question. I think we need a better process that doesn't leave this hanging.

Tom Wetmore
ttwetmore 2011-04-14T22:08:26-07:00
Sorry, the citation should have had the page number in it!

"Norwich, Connecticut, City Directory, 1874-1875." McAlpern Publishing Company, 1874, page 44. Daniel Wetmore, ship builder, 34 New London Turnpike. This is likely the Daniel Lorenzo Wetmore from Yarmouth, Nova Scotia.
GeneJ 2011-04-14T22:12:56-07:00
@Tom,

Let's just reverse it. Have it come in through the citation.
GeneJ 2011-04-14T22:13:49-07:00
P.S. We are going to be able to direct those "records" to the research log if we prefer, right?
ttwetmore 2011-04-14T22:43:53-07:00
GeneJ ecrit: "Let's just reverse it. Have it come in through the citation."

I'm not sure what you mean by that. I might guess that in the example I did for Russ you'd want to think of that INDI record as a CITE record instead. Well, the fact is that evidence level INDI record is indeed exactly what you need to have your same citation world in one that accommodates evidence person records. So yes, it you'd rather think of evidence person records as citation records, it would still all come out clean in the wash. We would still need to be able to structure the record with the "person" tags in order to allow all the other advantages to apply, the sortability, the searchability, the suggestibility, and so on. Basically you can't generate a citation without an evidence person record, and you can't have a legitimate evidence person record without the citation fields, so the two concepts are wrapped intimately together in this model. As my father would say, we could call it a fig newton and it wouldn't change what it was.

Ditto she sayeth: "We are going to be able to direct those "records" to the research log if we prefer, right?"

The answer would have to yes, but the details would differ on what Better GEDCOM decides a research log is. I think there are two possibilities, with maybe others in the middle. First a research log might be an actual Better GEDCOM modeled entity, so there would be records in the database for the entries in the research. In this case your answer is an instant yes. At the other extreme a research log might be an application level thing that is created by the application from information that exists in other records in the database. I think this could be done very simply also. I think the former approach is better, and I think Gier has already been thinking about some requirements to cover it, and Adrian's latest examples are heading for answers to that also.

Tom Wetmore
hrworth 2011-04-15T05:59:31-07:00
Tom,

I am not sure what the difference is between an Evidence Record and a Citation. I think the two terms contain the same information.

Where in the Source, and What I recorded into the fields provided by my software, where the details of what I saw and recorded.

The Field Name in a database may be different but what gets entered is the same.

You found "Daniel Wetmore, ship builder, 34 New London Turnpike. This is likely the Daniel Lorenzo Wetmore from Yarmouth, Nova Scotia." on page 24.

In my software, that would be entered into the Citation Text field. Page 24 would be in the Citation Detail.

As I understand it, that is an Evidence Record to you. That is a Citation "record" to me.

In the end, the same information would be in the BetterGEDCOM file.

I suggest, that another "attribute" or set of information would be associated with that "Evidence Record" / "Citation", which would be Notes for that record or citation.

I would suggest, at least what I would do, is pout "This is likely the Daniel Lorenzo Wetmore from Yarmouth, Nova Scotia" in the Citation Notes field.

I am not sure if GeneJ would associate any additional information, but that is how I see this.

Russ
ttwetmore 2011-04-15T06:40:02-07:00
I was in the process of nodding off in front of my screen as I composed my last response to GeneJ, thus horribly written. One of those paragraphs should have read more like this:

In the example I did for Russ you could think of the record as a citation (tag CITE?) record instead of an INDI (evidence person) record. In this evidence and conclusion approach, the evidence level INDI record DOES CONTAIN exactly what you need to support citations. So you can indeed think of the the evidence person records as citation records. We would still need to be able to structure the record with the "person" tags in order to allow all the other advantages to apply, the sortability, the searchability, the listability, the suggestibility, and so on. Basically you can't generate a full citation string without an evidence person record, and you can't have a legitimate evidence person record without those citation fields, so the two concepts are wrapped intimately together in this model. As my father would say, we could call it a fig newton and it would still be what it was.

This is a novel idea for me, that an evidence person record is also a citation record. My emphasis on the semantics of this record has always been the fact that it contains structured information about a person taken from a single item of evidence from a single source. But it also contains all the structured and unstructured information required to complete citation strings. I'm willing to let anyone call this record anything they want and declare victory.

Tom Wetmore
louiskessler 2011-04-15T07:12:38-07:00
Yes. Evidence == Source Citation.

See also: http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32692308

and the discussion surrounding that.

Louis
hrworth 2011-04-15T07:16:59-07:00
Louis,

I continue to believe that the term "Source Citation" is an incorrect term.

I propose that Evidence has two pieces of information. The Source where the information came from (book, online record) and a Citation, where in the Source and What was recorded from that Source.

There are fields, in Evidence Explained, as I have studied it, that define the Source and that define the Citation.

Russ
ttwetmore 2011-04-15T07:26:52-07:00
Russ says:

"I am not sure what the difference is between an Evidence Record and a Citation. I think the two terms contain the same information."

I have come around to the idea that the only difference between them is how you look at them (the blind man and the elephant metaphor applies here).

"You found "Daniel Wetmore, ship builder, 34 New London Turnpike. This is likely the Daniel Lorenzo Wetmore from Yarmouth, Nova Scotia." on page 24... In my software, that would be entered into the Citation Text field. Page 24 would be in the Citation Detail ... As I understand it, that is an Evidence Record to you. That is a Citation "record" to me... In the end, the same information would be in the BetterGEDCOM file."

Exactly. What is important in these combo citation/evidence person records are two things:

1. The structured information (information structured with Better GEDCOM tags that have a defined pattern and defined values) that can be gleaned about the person. I have outlined why having that information is so important for genealogical software that supports the records-based (evidence and conclusion) process.
2. The structured and unstructured information needed to complete the citation.

There is only one thing missing now. And that is, where do you put your actual conclusions that describe why you have brought together a few of these citation/person records into a conclusion person. You CAN'T put that information into the citation/person records because it applies not to them specifically, but as to why you are combining them.

But this is something I have also covered many time in my descriptions of the processes. These conclusions are really the "sources" (I know that word doesn't make sense to some -- but please go back an read Mike's example where he hit the nail right on the head for this one) of the conclusion persons. Please make up another word for "source" if you want to to make it more palatable. And note what Mike implies about that source -- that source is YOUR BRAIN.

Let's say you have three citation/evidence records for Daniel Wetmore from city directories in Norwich and New London, Connecticut (next door cities), and you decide they are the same person. You have three evidence/citation records for them, say records I1, I2 and I3. Then your conclusion record could look something like:

0 @I4@ INDI
  1 NAME Daniel Lorenzo /Wetmore/
  1 SEX M
  1 INDI @I1@
  1 INDI @I2@
  1 INDI @I3@
  1 SOUR @I666@  <<-- pretend this is me in the database [note the id]
    2 TEXT The three evidence records were joined together because they each
      3 CONT mention a person with the same name and same occupation, and I have
      3 CONT been unable to find evidence of any other person named Daniel
      3 CONT Wetmore living in southeastern Connecticut during this period of
      3 CONT time.
    2 TEXT In one of these records he is named Daniel L. Wetmore. I beleive this
      3 CONT person is the Daniel Lorenzo Wetmore, born in Yarmouth, Nova
      3 CONT Soctia, because land records in Nova Scotia state he sold his
      3 CONT property in 1868 after he had removed to Connecticut.

Now imagine the wonderful citation that could be generated for this conclusion person. Because you have this two-tiered structure, you have access to the three separate citations of the three individual records, but you also have access to this text that can describe your research and conclusions.

To take this a little further, there is no need to restrict this way of structuring person information into just a two-tier system. Two of the three systems I mentioned for GeneJ (nominal record linkage and the NewFamilySearch tree) are limited to two tiers, but the ZoomInfo.com system uses a multi-tier system. I could explain why this is mandatory in that application, but it isn't too germain right now.

To see how a multi-tiered approach works in this example, please imagine that I had already researched and figured out all the Daniel Wetmores living in Nova Scotia during this time period, so imagine that I already have conclusion person records for these Daniel Wetmores. Now I want to COMBINE the conclusion person record that holds all the Daniel Lorenzo Wetmore records from Nova Scotia with the conclusion person record just created that holds all the Daniel Wetmore records from Connecticut. I just create a new conclusion person like this:

0 @I7@ INDI
  1 INDI @I4@  <<-- the conclusion person we created above.
  1 INDI @I9@  <<-- the conclusion person for the Nova Scotia persons.
  1 SOUR @I666@  <<-- it's me making the conclusion again
    2 TEXT ...   <<-- my words on why these two persons are probably the same

I hope you can visualize what a wonderful solution this provides. You have all your evidence, all your citations, all your conclusions/decisions, all your research statements, all bound up into a simple structure of information, and all that information is COMPLETELY accessible for computer processing.

Note that this does not really MERGE the two conclusion persons. We still have available to us the reasons why we combined the Nova Scotia records and the reasons why we combined the Connecticut records! We loose no information. And when we decide we made a mistake, that these are not the same Daniel Wetmore in the two places, we simple remove the top, third-tier record, and we have lost nothing, we're simply back to having a Nova Soctia Daniel Lorenzo Wetmore and a Connecticut Daniel L. Wetmore.

Don't you just love it? It's as finea as Carolina!

Tom Wetmore
GeneJ 2011-04-15T08:29:54-07:00
Morning Tom, Louis, Russ:

Looks like we are closer to a functional understanding that our reference notes [:)] are the clearing house for all the evidence.

How do Adrian and Geir feel about this? --GJ


P.S. I had started an article for the wiki to look at the many ways you can categorize the parts of a citation.
Some time back, Ancestry Insider responded to our own Adrian with a humorous take on some of the difference in the entry, "Of Sources and Citations: All Bets Are Off."
http://ancestryinsider.blogspot.com/2010/05/of-sources-and-citations-all-bets-are.html

Mills uses the term "citation" to refer to a reference note (footnote, end note), a source label and a source list entry.

The article for the Wiki got hung up on terms to better distinguish between "source" and the "citation" parts of a referenced note in a data-based application. Hope we can work on that in the EE & GPS Support area when we have the green flag again.
hrworth 2011-04-15T10:14:52-07:00
Tom,

I think one of the issues, on this Wiki, is the use of some of the terms that are used.

Conclusion Person, Evidence Person, etc.

The issue is Common, End User Terms (both experienced, and not so experienced). The EP and CP are terms, that folks like me, a commoner, do NOT understand.

I think that those terms do NOT belong in this Source, or Citation, or Evidence discussion. What 'role' or term you use for a person is defined or should be defined elsewhere. As long at a Citation, referring to some aspect of that person, can be identified, and that Same Citation can be referred to a Source, then when packed or unpacked by the application, can rebuild what the originator had in the sending software program.

Name of person, found in a Book, found on a page is what, I hope, we are talking about here.

Details of the Book (source) are a series of Fields, as suggested by Evidence Explained.

Where in the Book, and What in that Book are then, probably in a free form or perhaps 2 fields, make up the Citation.

That Citation may also carry some Notes, or observations, as you suggested.

The Role of Definition of that person (Evidence, Conclusion, etc), would be handled elsewhere, in a different part of the BetterGEDCOM file.

But, I do think and hope we start to use more commonly used terms.

Thank you,

Russ
ttwetmore 2011-04-15T11:23:00-07:00
Russ,

Terminology is such a personal thing. We probably argue more about that than anything of real substance.

It seems we have taken some steps to unify the concepts of an evidence person and a citation and possibly reference notes. This unified concept is composed of, I think, three things:

1. Actual information from an item of evidence that has been EXTRACTED AND STRUCTURED into a form that holds, for example a person's name and details about the person (sex, age, birth info, parent info), the exact nature and extent of that structured data being dependent upon the information available in the evidence.
2. A reference to the source the information came from with other structured and unstructured information about the evidence this record concerns, intended to be used in and as parts of citations.
3. Research notes the user has attached to the record based on what research and thinking has caused him/her to realize about this evidence.

This unifies the three concepts pretty well. But people like me think of this object as a "person", well maybe a partial person or a limited person, but a person none the less. Yeah, we've attached source and citation info to it because we're good researchers, and we want to say where it comes from, but we still think of these things primarily as records that convey information about persons. In the nominal record linking process these records are actually called "nominal records", primarily because they are often times not much more than just a name with a bare fact or two. In the NewFamilyTree application they are called "persona" records to distinguish them from the full-bodied conclusion records. In the ZoomInfo application we sometimes called these things "mentions," which might be pretty similar to the term citation actually. In the up and coming records-based paradigm shift coming to the genealogical world they are simply called "records," where here the word record doesn't mean a database record, it means a piece of evidence, as in going to city hall and getting a record!!! Talk about confusing everything once again.

Then there are people like you and GeneJ who, I believe, see these records primarily as the sources of information that will eventually be citations in reports. I think for you there are lots of facts about a person in there, but you don't really think about "final" persons until you make your conclusions, and kind of as a last step create conclusion persons in a database. It seems you think of there being two separate worlds. You work in the evidence/citation/research world until you are ready to make strong statements about the persons who existed, and then move into the person world when you construct conclusion records as a step near the end of your process. Of course, I'm not a mind reader and may have it all wrong.

Then maybe there are others who want to call these records research notes or research log entries and have a different perspective. I don't understand their point of view so can't say more.

I also think as a software developer who would have to implement software to manipulate these objects. From that point of view, as a software architect, I would instantly recognize the fact that the internal data structures needed to hold evidence persons/citations are so similar to the data structures needed to hold conclusion persons, that I would define and use one data structure for the two of them. Probably 98 out of 100 software developers defining this data structure would do the same thing and even call that data structure a Person. The other 2% would probably call it an Individual. So all these things prejudice the terminology I prefer.

I am interested in good terminology, but I am more interested that we have the right concepts firmly embedded in the Better GEDCOM foundation. I would concede on terminology any time it assured the acceptance of a solid concept. If we end up calling these things citations, which I'll say up front I don't like, it would have no impact on the data that could be stored in a Better GEDCOM file, nor would it have any impact on the user interface to a software application that used the Better GEDCOM concepts. It just boils down to "what sounds right" to the various of us. These records don't "feel" like citations to me, but rather records that contain lots of information useful in generating citations. The do "feel" like person to me, because the software needs to index them under their names, under the locations mentioned in them, under the dates when things happened, and then the algorithms must process them in the context of dealing with them as holding information about persons.

Tom Wetmore
GeneJ 2011-04-13T18:38:03-07:00
Adrian,
"In fact, if you think about it, if I used an article from a learned genealogy society "Who were the parents of X?" as my source, I'd be better off because the events and attributes would point to the article and all the logic would be in there. Whereas if I've done the primary research, the events and attributes would, under the current GEDCOM inspired structure, ONLY point to the sources and NOT to the logic explaining how to combine the structure."

"the events and attributes would point to the article and all the logic would be in there"

See, in my practice, the events and attributes point to my citation and all the logic is THERE.

I don't think we'll get a good comparison if we try to use the citation system in GEDCOM 5.5 (15 years old).

I know you haven't been able to attend the more recent Developer's meetings, but between those meetings and the postings to the EE & GPS support page, we think we've established that GEDCOM 5.5 does not transfer citation data well.
See Application Overview
http://bettergedcom.wikispaces.com/Application+Overview#Application%20source%20systems%20functionality

If I got your comment wrong, let me know.

As well, BetterGEDCOM is well on the way to documenting the substantial differences between how different applications handle citation data. There is a summary at Application Overview. We've documented some of the detail at Application Data.

You'll note on that page, other than the "GEDCOM based group," most of the applications we have looked at will form a citation that includes all the evidence and all the comments needed to support the logic to which you refer.

Separately, I only use software applications that allow me to make a full record of my evidence. For me, it is the number one consideration in deciding which application I will use, and it's pass-fail.

Why is this so important to me?
Without that information in my citation, I wouldn't be able to produce a family group sheet or drafts for narratives/biographies.

I need all the evidence in my citations, and I need all my logic and all my reasoning in that citation.

The last family group sheet I produced had 139 citations. The biography I'm working on now has that many _excluding_ vital records.

It's really important to me that my citation data not end up under the hood or automatically combined somewhere.

Please let me know if I was not following your intent.
ttwetmore 2011-04-13T22:36:37-07:00
GeneJ,

You use the term "citation" in an unconventional way. A citation is a text string, often formatted by the rules found in some templates defined by standards, that describes where items of evidence can be found. When you say "I need all the evidence in my citations, and I need all my logic and all my reasoning in that citation" you are saying something that doesn't jibe with this definition and something that sounds confusing. I think you are using the term citation when you mean, or at least most of use would use, the term source. This may clear up some misunderstanding.

Your method of storing your evidence and your research results in sources serves you well. But it is not the only way. The notion of an evidence person is simply the idea of extracting data about a person from the evidence, which can remain unchanged in the source records, and putting it into its own person record. Your source records haven't changed. But now you have some very useful records in your database that you can do very powerful things with. Evidence persons can searched, displayed in lists of many kinds and many criteria, and, importantly, rearranged in ways that let you experiment with the ramifications of creating final conclusion persons from different sets of evidence. You can use the computer to aid you in your thought processes as you go through all the "what if's" that need to be investigated when deciding how the evidence fits together. You are an expert at doing this kind of reasoning in your head. Having the evidence persons simply allows your computer to be a tool that can help you do this more effectively. If you hypothesize that two items of evidence might be from the same person, if you had your system aiding you, it could automatically flag the fact that the hypothesis would cause all kinds of inconsistenicies in the names of children or in the places of residence. Doing it in your head, of course, you'd figure it out too, but with the computer as an assistant, it could be much done more quickly and maybe more painlessly.

And the very best thing about this approach is that your applications can now take a more proactive role and actually suggest ways to combine the evidence that would maximinze or minimize different criteria. At this point our genealogical applications leave the realm of simply being nice conclusion handlers to being true expert assistants.

Wouldn't you find it useful if you could ask your program to find all the sources that have evidence about persons with a certain name, or who were born in a certain year, or who had a spouse with a certain first name, and so on. Would it be nice to ask your program to show you the the sets of all evidence records that have more than a certain threshold of probablility of being the same persons? These are just a few of the "add-on" capababilities that come with evidence persons and smart programmers.

The idea of evidence persons doesn't invalidate any of your requirements, and it doesn't force you to change any of your practices. It simply enables a more powerful set of capabilities that you can access from within your genealogical system.
AdrianB38 2011-04-14T08:34:24-07:00
Gene
Not sure we're on quite the same page, as it were.

First - the bits where we agree:
- "we think we've established that GEDCOM 5.5 does not transfer citation data well". Indeed. You must excuse me in this since I know not everyone can read standards like the GEDCOM one, but having read the bits of it relating to citations, and compared those bits to the range of data that people want (justifiably) to enter into "citations", then it was clear to me right from that moment that GEDCOM was guilty of inadequate transfer and I didn't and don't need any more convincing! I totally agree!

- "I only use software applications that allow me to make a full record of my evidence". Good.

Now comes the bits where I don't think we're understanding one another.
- "in my practice, the events and attributes point to my citation and all the logic is THERE"
and
- "I need all the evidence in my citations, and I need all my logic and all my reasoning in that citation"

My problem is that I don't understand how you can possibly fit all your logic and all your reasoning into a citation. And since I certainly believe you've got all your logic and all your reasoning somewhere, then we clearly are at cross purposes somewhere in the understanding of one or more of "all", "logic and reasoning" or "citation".

Let me try and make a better job of defining and / or understanding:

Re "logic and reasoning" - I think I've found the sort of term to describe what I've been talking about. On http://www.bcgcertification.org/skillbuilders/index.html it makes reference to Proof Arguments and Proof Summaries. In particular, I liked Barbara Vines Little's article on http://www.bcgcertification.org/skillbuilders/skbld099.html for her discussion of when to use summaries and when the full argument.

So, in my attempt at a "Fact Research Process" on the page to this discussion, my stage 6 refers to "analyse results to see if high-level objective has been met. Document that analysis. ... Output ...
* Results of analysis
* Final conclusions – if any"

The output from this stage, it seems to me is therefore a Proof Argument or a Proof Summary. _That_ is the sort of thing that I've been rambling on about recording. That is the sort of thing that I would like to see recorded in my genealogy app as the logic and reasoning. Storing that is what _I_ meant by recording the logic and reasoning.

Re "all" - And, looking at the level of detail in the sample proof argument for the father of Sidney (Withrow) Lusher, then I want to see ALL that detail in my app.

No, it won't be the same words in the same format because I'd be writing it just for me and so it would be a lot more abbreviated and the sources might well be loosely described rather than cited properly because the proper citations would be a/v elsewhere. But, in terms of the information content, that's what I want to record.

Re "citation" - which leads me to the last of the 3 possible points of confusion between us. When you say "citation" I'm thinking of a classic bibliography / foot note / 2nd footnote, which can indeed usefully have an extra sentence or two. But - that's just a sentence or two. Trying to get all the proof argument or proof summary, particularly the 1,742 words of the Sidney (Withrow) Lusher proof argument into _that_ sort of a citation doesn't make sense. (I pasted the argument into MS Word and let it count!). So either you're putting less than the proof argument or proof summary into the citation, or your citations really are that big, in which case I don't imagine you're going to be using the printed reports from your genealogy app!

So where's the confusion between our views coming from, Gene? Like I say, I totally believe you've got it all somewhere, but it seems to me that when you say "I need all the evidence in my citations, and I need all my logic and all my reasoning in that citation" then either your logic and reasoning isn't yet assembled into a proof summary or a proof argument at this point; or your citations are many orders of magnitude bigger than mine.

Or is there something else I'm missing????
GeneJ 2011-04-14T14:11:51-07:00
@ Adrian:

Please let me know if this helps. --GJ

(1) You wrote, "didn't and don't need any more convincing" -- didn't intend to suggest you had, but others who observe the wiki might not. Ditto, I didn't recall that you had been able to attend the couple of Developers Meetings when this particular point was discussed.

(2) "Proof Argument or a Proof Summary. _That_ is the sort of thing that I've been rambling on about recording."

See also Requirements Catalog, Evidence02
Not all authors will assess the same circumstances the same way. Not all circumstance is the same, not all proofs are equal. Not all writers of proofs are equal.
Littles' work is excellent, isn't it.
My thoughts follow.

(a) Sometimes logic can be adequately conveyed in a rather simple statement. Here's a case from my working file that some might consider resolved. I consider it to require a proof because the conflict vests in a highly regarded work recently published. The statement below is from my working file:

Scott Andrew Bartley, ed., _Vermont Families in 1791_, vol. 2, pg. 165, reports death of Hannah 04 Mar. 1797, Rumney; citing VR, submitter
Sprague attributes death to Hannah, dau. William Preston and Hannah Healey; however, William and Hannah¹s dau. Hannah had married 1774 to
Asahel Brainerd, resided Rumney at 1790, and lived to be about ae 88. The Hannah who died at Rumney in 1797 was the 10-mos. old dau. of
William and Elizabeth, and twin of Joseph. The 1797 NHVR (Rumney) confirms the parents of Hannah then deceased to have been William and
Elizabeth Preston.

I have hundreds of statements. To me, the above is a simple-styled proof.

(b) Complex proof statement or Nature of Proof.

Not all authors will assess the same circumstances the same way. Not all circumstance is the same...
Trying not to oversimplify ... When a "proof" rises to the level of real "identity" (even saying that oversimplifies this), I want the proof to be obvious. I use a tag. Even when it doesn't, if the proof is complex (will refer to multiple persons and multiple events about those persons and multiple sources for each) I use a tag (see Evidence02). If I require better formatting or just more power, I use MS Word.

I know you said you like the 1700 word proof, but I fail to see how that is appropriate for all circumstance--maybe we could discuss determinants of length and applications of length separately? As a general rule, length is less of a factor when I decide where I want to put the proof. Other than length, there are _several_ narrative and evidence/narrative/proof factors, too, that authors weigh, about which there are principles.

At the very end of this message, I'll post the old tag proof that was once in my working file about little Hannah. It wasn't the last version I had of that tag proof, but I hope it serves for this purpose.


(c) Published Proof Arguments.

Because of their nature, proof arguments make good "content" as far as publication, and publishing them can add to their value (a form of peer review, etc.)
There are more and more opportunities to publish. Some folks have started using blogs or their own internet sites to publish proofs. If you allow comments/feedback, then you have at least invited some peer review.

If the proof you have written is published externally, it simply becomes another citation in your file. See (2) If that proof is written in MS Word and not a part of your file, you would likewise create a citation for it.

See also Helen S. Ullmann, CG, FASG, has a GEDCOM posted at WorldConnect. See “Helen Schatvet Ullmann, CG, FASG, GEDCOM <http://wc.rootsweb.ancestry.com/cgi-bin/igm.cgi?db=hsullmann>"


(d) Case Studies (proof arguments on the BCG Work Samples page).

The case studies are excellent examples of proof summaries that are also intentionally instructional. As I recall, there are some explanatory comments that describe the unique presentation of case studied. If you didn't see that, let me know.

I suggest the GPS standard in your working file is not so high as the writing quality and instructional nature of a Case Study, but if that is the standard you set for yourself in your file, I wrote Evidence02 so that it would get you there.

I don't see how the model you use influences the requirement. See Evidence02. (I made the same comment on the discussion for the requirement.)


(3) You wrote, "fit all your logic and all your reasoning into a citation ... When you say "citation" I'm thinking of a classic bibliography / foot note / 2nd footnote, which can indeed usefully have an extra sentence or two"
Ignoring nuances, when I create a biography using best practices, my citations represent the record of all of my evidence. As a general rule, length is not a consideration in my working file. If I have a chance, I'll find the professional reference, but ... Genealogical citations are long, and in a working file, they can be more so. Experts are likely more proficient than those of us who aren't pros.


Hannah Preston (1312) (b. 6 May 1796, d. 4 Mar 1797); >>Proof Notes>>
HANNAH^6 PRESTON was born 1796 at Rumney. She is generally reported to have been the twin of Joseph Preston, both children of William Preston and his wife, Elizabeth. According to _Vermont Families in 1791_, one Hannah Presson/Preston died at Rumney in 1797, and the source associates that death with Hannah^6 (William^5, William^4 Presson, William^3 Presbury...) [1]; however, compiler herein attributes the 1797 death to Hannah(1)^7 (Maj. Wm^6, William^5, William^4 Presson, William^3 Presbury...) [2]:

1) It was a common practice of the times to re-use a name if one child born earlier had died young. Hannah was born in 1796; and the next birth of a dau. to the Maj. occurred in 1808,--she was named Hannah also. [1]

2) William Preston and his wife Hannah Healey removed to Rumney in about 1768 [1], and they remained there until about 1785 [1]. They are the only family of the surname and variants known to have resided at Rumney in the pre-revolutionary time period, [9]and the family was frequently recorded early at Rumney as "Presson." William and Hannah's dau. Hannah, born 1756 (second or third child, the just younger sister of Maj. William Preston) [3], almost certainly resided with the family at Rumney and would have been aged 18 in 1774 when Asahel Brainerd of Rumney married a woman called "Hannah (Presson) (Preston)" and “of Rumney” by Brainerd family historians in 1908. Asahel was the son of Daniel Brainerd, considered one of two founders of Rumney at the time it was re-granted. [5]

3) Lucy Abigail Brainard, The genealogy of the Brainerd-Brainard family in America : 1654-1908 (1908), writes that Asahel Brainerd's wife Hannah (Presson) (Preston) died 1844, then ae 89.[5] Using almost identical language, George W. Burch, compiler, Ancestry and Descendants of John Russell Haynes (1924) describes the same marriage and writes that she died in 1844 at age 84. The origin of the conflicting ages reported by the two texts is not known, however a woman described as ae 84 in 1844 suggests birth c1760, and about ae 14 in 1774 when Asahel married Hannah. While women did marry young, none of the known sisters of Hannah^6 married so young. Hannah’s just younger sister Mary m. c1784, then about ae 26; and her sister Elizabeth married the same year, then about ae 22. Hannah’s younger sister Hitty married in about 1790, about ae 22. [Separate Research]In the alternative, a woman ae 89 in 1844 (b. c1755), would have been about 19 at the time of that marriage, and that age correlates favorably with the date of birth reported for Hannah^6, b. 25 March 1756 [VR]. Based on the vital records, Hannah^6 was ae 18 yr., 5 ms., and 19 ds. at the time of the Asahel Brainerd marriage. [7]

4) Finally, if Maj. William's sister (Hannah^6, b. 1756) survived to adulthood and remained single, as _Vermont Families_ speculates, the odds seem greater Hannah^6 would have removed to Strafford with the parents and most of her sibling in the mid- 1780's or sometime thereafter, and less that she would have remained at Rumney for so many years beyond to have died there in 1797.


[1] Scott A. Bartley, _Vermont Families in 1791_ vol. 2 (St. Albans, Vermont: Genealogical Society of Vermont, 1997), pages 165- 167 for the family of William and Hannah (Healey) Preston (Presson), submitter Spraque sites VR as source of her assertion that Hannah^7 died in 1797.
[2] New Hampshire, Registrar of Vital Statistics, "Index to births, early to 1900;" database of extracted records, Family History Library, FamilySearch.org (www.familysearch.org : accessed 15 June 2006), for Hannah Preston, born 06 May 1796, at Rumney Twp., Grafton County, parents William Preston and Elizabeth, cites batch 7540069; FHC Source Call No.: 1001028 (film)
[3] New Hampshire, Registrar of Vital Statistics, "Index to births, early to 1900;" database of extracted records, Family History Library, FamilySearch.org (www.familysearch.org : accessed 15 June 2006), for Hannah Presson, born 25 March 1756, at Chester, Rockingham County, parents William Presson and Hannah Helay, cites batch 7540069; FHC Source Call No.: 1001028 (film)
[4] It was not so large a place. In 1775, a year after Asahel and Hannah married, there were eleven men in the army [a]. Even later, almost ten years after their marriage, a 1783, poll tax statement [b] reported there were then 50 males aged at least 21 years residing at Rumney. [a] Isaac W. Hammond, A. M., _Provincial and State Papers_, vol. XIII (Concord, N. H.: Parsons B. Cogswell, State Printer, 1884), page 354 for “Rumney,” cites vol. XI, p. 729; [b] _Ibid._, page 357 for “Return of Ratable Polls, 1783” ; digital images _GoogleBooks_ (http://books.google.com : 4 Dec 2007).
[5] Lucy Abigail Brainard, The genealogy of the Brainerd-Brainard family in America : 1654-1908 (1908), vol. I-part III, pages 44-5 (parents); 52-3 (marriage and family), for Asahel^4 Brainerd and Hannah (Presson) (Preston); digital images, Ancestry.com (http://www.ancestry.com : accessed 4 December 2007;) compiler Brainard writes that Asahel^4 Brainard (Daniel^3, Joshua^2, Daniel^1) "of Rumney, Grafton Co., N. H.; m. Sept. 13, 1744, Hannah Presson (Preston), of Rumney"; he died 1813, "in his 59th year"; she died 1844, "in her 89th year.")
[6] George W. Burch, compiler, Ancestry and Descendants of John Russell Haynes, (Hartford, Conn., 1924.) Asahel Brainerd (Fourth Generation); "Asahel Brainerd (Daniel-3, Joshusa-2), born in Milligton, East Haddam, November 1, 1754. His family moved to Rumney, N. H.; married September 13, 1774, Hannah Presson (Preston) of Rumney; died in 1813, in his 59th year. Hannah died in 1844, in her 84th year."
[7] Date calculator, The Master Genealogist. [8] Who's this? Hannah Preston, b. MA, aged 76; living with Elijah Smith and apparent wife Salome at Winchester, Cheshire Co.,
New Hampshire, 1860.
[9] Page by page review of Grafton County Deeds (see research memorandum), NHVR, Telephonic interview of Rumney Town Clerk and also of I. Kemp (see related research memorandums).
GeneJ 2011-04-14T14:12:46-07:00
p.s. Knowing how much you love formatting .. much in that proof was lost on transfer
GeneJ 2011-04-14T14:50:06-07:00
@Tom,

Please let me know if this does not answer your questions. --GJ

(1) You wrote, "You use the term "citation" in an unconventional way" and "clear up some misunderstanding."

If you have the chance, let me know what you think I misunderstood. A few weeks ago in the Developers' meeting, I was asked to describe the differences between us, and led to believe I was spot on. In the context of the BetterGEDCOM definition, I need all the evidence recorded in my citations ... For examples of citations, see EE & GPS Support > "About Citatons"

GEDCOM 5.5 uses the term differently than I did and differently than BetterGEDCOM. See the EE and GPS Support part of the wiki; ditto EE definition in either or both the main definitions or the EE supplement.* (The concept of the BetterGEDCOM term was extended into "citation element" and "citation template." See BetterGEDCOM Definitions under Development. The latter terms were discussed and tweaked during the Developers Meeting two weeks ago.)

(2) As to what I'm pretty sure you call the "source," GEDCOM 5.5 defined it as a a bibliographic entry. That definition is out of date. See Evidence Explained 2007 or 2009. The search phrase "collection as lead" should return a group of examples, but there are many others. In short, bibliographic entries are often considerably more general than what some might call thee "source" found in the full reference note citation.

(3) You wrote, "ask your program to find all the sources that have evidence about persons with a certain name"
I quickly looked over your list (it was very late here, I didn't want to think about and/or parameters). For the purpose of your point, let's say I can already preform the sorting and filtering I need. Because of that, I see sorting and such more about how the application is structured rather than what the model is. With good citation elements and templates, I'd think most models could enable complex queries.

(4) You wrote, "But it is not the only way."
I'm at the front of the line on that one. As in any discipline, however, there are best practices, which are well documented and supported by a whole body of topical material--these are things that have been used, abused, and widely peer reviewed.

I read the balance of that paragraph and the next, and I don't doubt that you believe in all those features and benefits.

Without elaborating, I am concerned about underlying prototype/application methodology said to be the basis of the model. That methodology has not been objectively tested, documented, published and throughly peer reviewed.
We said BetterGEDCOM would be unbiased and transparent. Sometimes we talk now about taking more of a scientific approach to genealogy.

I know that you believe in all those features and benefits. I'm just a genealogist who'd like to pull the source of the source and kick some tires. I think "automatic combination of genealogical records" is behind the left rear.
ttwetmore 2011-04-14T18:45:34-07:00
GeneJ,

(I never said you misunderstood anything -- I said there was a misunderstanding.)

I would say the main difference is this.

I see a citation as a formatted text string that describes where an item of evidence can be found. It may also summarize succinctly something about the evidence it contains, or it might contain a succinct statement about the quality of the source. I don't see a citation as a self-contained record in a database. Rather I see it as a string of text that is generated on demand from information that is stored in a database using templates say, that come from ESM or other standards groups. I definitely don't see a citation (or a source) as record in a database that also serves as a large container where I store my evidence and my reasoning. Those two things are too important to be relegated to just being a part of something else.

I sense that you see a citation as a large container where you can store the "classical citation", but where you also put evidence as sentences or as formatted text or maybe as other things. You can also put in sentences describing what you think about the quality of the source. And you put in sentences or other statemets about the conclusions you made by considering the evidence. I don't mind that you personally do that, but I definitely see that process of almost hiding away the most important information you have to reason about. For you, you don't consider it hidden, because you have established over the years, a carefully honed process that you use. But for me and many others, you take the idea too far. For example, in your city directory examples, your citations include the actual line from the city directory. Not so bad really, but what if the evidence were a birth certificate; would the citation hold the entirety of that also? And the big question is, if the citation doesn't hold the entirety, then where is that entirety? For me it just doesn't make sense to put your evidence away in your sources in an unstructured form. It make it very hard to use in general.

You have a system that works well for you. But that might be keeping you from seeing the advantages of having your source information, your evidence information, and your research notes/conclusion information kept separately so they can be handled more effectively and in different ways by software. It's not like you would loose anything. The evidence and the research notes and the source information would still be bound together as a set of database records.

You would like to see a prototype to demonstrate whether my ideas would work. First of all they're not my ideas, as I hope I will now show. I have described systems with the properties I advocate on this wiki before, so let me summarize three of them here:

First, there is the class of systems called "nominal record linkage" programs. They date from the 1970's. They are used to study family patterns in rural pre-industrial-revolution villages in Europe. They are detailed studies in which all public records (e.g., parish registers, land records) are converted into evidence records, loaded onto a computer, and those records are then subject to algorithms that reconstruct the acutal living families, showing how the families evolved. These programs use the same set of data modeling concepts I have been describing. Evidence person records are extracted from parish registers and land records. Here the parish registers and the land records are the sources. The contents of those registers, that is, the entries written by the priests and the land registrars, are the actual evidence. That evidence is examined by researchers who extract it into evidence person records. Algorithms then combine the evidence person records into conclusion persons and families by deciding which sets of evidence persons refer to the same real individuals. The results reconstruct the families of the areas. The researchers then study things like family size distribution, death patterns in children, fecundity, mortality and so on.

Second, another prototype you can check is the NewFamilySearch tree. You have to be a church member or be approved for some other reason to have access right now. Yeah, it's ugly and you can't trust the data in it, but that's beside the point. The point is that the NewFamilyTree is completely based upon the idea of taking evidence person records from every conceiveable source, registers, census records, poor quailty GEDCOM files from anybody, LDS church records, family history books, and then letting the users of NewFamilySearch, in a wiki-like fashion, join those evidence person records together into a massive pedigree consisting of conclusion person records. (I'm not saying I approve of this approach; I'm just trying to show you examples of the ideas in use.) In order to support this, the NewFamilySearch must use the evidence person record concept. The NewFamilySearch uses the term "persona" for the evidence person and the term "person" for the conclusion person concept. They took the term "persona" from the GenTech model.

Third, a system you can look at is at ZoomInfo.com. This is not a genealogical application; it is a commerical application that attempts to find all the working professionsals in the English-speaking world by automatically extracting data from the world wide web using natural language processing. Their software is continuoulsy scanning the web. When "mentions" of persons are found they are extracted into the equivalents of evidence person records. Here, the sources are web pages, the evidence is HTML text, and evidence persons are extracted from that HTML text using natural language processing. Instead of facts about birth and death and relationships, these evidence person records are concerned with job titles, companies worked for, degrees earned and so on, but the concept is the same. The ZoomInfo application then takes the billions (yes, billions) of evidence person records it has extracted and combines them down (using algorithms similar in character to those in nominal record linkage, and also analogous to the algorithms I talk about in the DeadEnds context) into profiles of hundreds of thousands of real professionals. No human steps are involved in this process. I know a lot about this particular application because I was the geek who wrote the software that does the merging of evidence person records into conclusion persons.

These are three systems that demonstrate the ideas that I think should be supported by the Better GEDCOM model. In addition to these three example systems, you will find talk about records-based and evidence-based genealogy on the rise everywhere. It is an idea that is now sweeping into the world of genealogical software. Record-based and evidence-based genealogy are just synonyms for the same things I've been the evidence and conclusion process for twenty years. If Better GEDCOM is to be placed to be a model that can handle systems that support record-based genealogy, then the ideas I've been promoting need to be considered.

Because of your expertise and because of how well you have developed a way of doing things, I don't think you see the value of having your evidence put into the form of evidence person records. All my arguments over the past months, and the descriptions of the three systems above are the best that I can do to try to convince you of that value. You won't see that value unless you can understand how you could use that value to help you do your work. In my post three or so above this one in this thread I tried to mention two of the important ones.

Consider my case. I've been working on a very large project for the Wetmore family for many years. I have data on thousands of Wetmores from records of all kinds of quality, and for some of those Wetmores I have a hundred of more items of evidence. I can't handle that mass of information by bundling together unstructured text into source (or citation) records. If I did that I would never be able to find anything. I have to get my evidence into a form that I can search through it at any time and in any way. One of the problems I am working on now involves the Joseph C. Wetmores who lived in Nova Soctia, Massachusetts, Connecticut, and England in the 19th century. There were at least three of them, but they moved around constantly, one was absconding from the law so is hard to trace, one was a shoemaker but want-to-be inventor who travelled to England many times, where he was probably a bigamist, to try to sell his ideas. I have over a hundred evidence person records for persons named Joseph C. Wetmore, but things are still so confused that I can't yet tell who was who. If I didn't have those hundred plus evidence person records immediately available to work with and inspect, have in front of me on my computer screen in well-organized lisd that I can sort with various criteria, it would be impossible for me to make any progress.

Enough for now! "Hwa is thet mei thet hors wettrien the him self nule drinken."

Tom Wetmore
hrworth 2011-04-14T20:38:50-07:00
Tom,

I have been trying to stay out of this discussion but, I have a comment on a small piece of your most recent posting. You said:

"string of text that is generated on demand from information that is stored in a database"

That is absolutely true. AND, in my humble opinion, is the Cause of the Source and Citation issue. Currently, it IS a string of characters (words) that are NOT strings of characters within a database. I fill in a number of fields, in a couple of screens, that make up that string of characters (word).

I have been told that I don't know what I am talking about, but that's OK, until we Break that string of characters down, to where they came from, will we know where those characters go in the receiving database.

If the sending application uses some sort of template, there are still fields. The receiving end will break up that string of characters and put them into fields that it uses. Template or not. (at either end)

A TITL field with information in it, is put into a TITL field when received. How it is presented to the End User, may or may not have a Title Field. It may just be part of string of characters in a box at the other end.

I think we need to look at the string of characters (words) and understand what they are.

I have come to this by looking at Evidence Explained. Each Example has the "field names" when and how they are used.

If the sending and receiving application uses these fields, the BetterGEDCOM file can be created and taken apart by the receiving application.

IF either does not use any Templates, the "Free Form" string of characters would be put in a more general field for that Citation.

Only one End User's opinion.

Thank you,

Russ
GeneJ 2011-04-14T21:51:15-07:00
@ Russ, Hiya!!

@ Tom, I have another deadline yet tonight. I'll try to read your post more carefully a little later.

I'm hoping we can become more aligned philosophically. Rather than the "model" being the focus, maybe needs or requirements should be more the focus.

Here in the states, it would be a mistake to try to dismantle that endnote/footnote/full reference note/BetterGEDCOM citation. It would break content and reverse course in terms of the discipline.

I really hope you'll go out and get a copy of Evidence Explained. The first two chapters is where you might focus your attention. She overviews there the guts of her approach.

Then also, as Adrian has done, perhaps take some time to look at the BCG work samples. Those are not footnotes from working files, but they are a good place to overview

Wish I had more time right now, but really have to run. --GJ
ttwetmore 2011-04-14T22:03:34-07:00
Russ,

I think you have the right answer. A citation is a string that is generated from information in records in a database. Here is what I believe those records are and how it should work. By the way this solution handles GeneJ's need to have evidence information and research notes as part of the citations.

First there are source records that represent the sources. If you think of them as GEDCOM SOUR records that's perfect, though the Better GEDCOM version will probably have more tags to allow a more complete alignment with ESM templates.

So here's an example source record for a city directory (I'll use GEDCOM syntax for the example):

0 @S1@ SOUR
  1 TITL Norwich, Connecticut, City Directory, 1874-1875
  1 PUBL McAlpern Publishing Company
  1 DATE 1874

Say on page 44 there is an entry for Daniel Wetmore, ship builder, 34 New London Turnpike. This is used to create an evidence person record that might look like this:

0 @I1@ INDI
  1 NAME Daniel /Wetmore/
  1 RESI
    2 DATE 1874
    2 PLAC Norwich, New London County, Connecticut, United States
      3 ADDR 34 New London Turnpike
  1 OCCU ship builder
  1 SOUR @S1@
    2 PAGE 44
    2 TEXT Daniel Wetmore, ship builder, 34 New London Turnpike.
    2 TEXT This is likely the Daniel Lorenzo Wetmore from Yarmouth, Nova Scotia.

Note how the SOUR tag in the person record has the page number and two text lines with information. Let's say that in this made-up GEDCOM version, the TEXT lines are supposed to be added to the citation string when it is created. Using TEXT lines like this allows people to add extra, custom text to their citations.

Then, later, when you want to generate a citation for this particular item of evidence, that is, this line from the Norwich City directory, you ask your software to generate that citation, and it would create a string something like this:

"Norwich, Connecticut, City Directory, 1874-1875." McAlperne Publishing Company, 1874. Daniel Wetmore, ship builder, 34 New London Turnpike. This is likely the Daniel Lorenzo Wetmore from Yarmouth, Nova Scotia.

This string is generated by a template that knows what tags to look for in both the reference to the source, found in the evidence record, and in the source record itself.

There is nothing novel or new or unusual in this example. This is exactly the way that everybody who thinks about this problem seriously comes up with. It is simple, effective and reeks of being a perfect solution.

Just to say something that you understand and I hope is obvious to others reading this. There ARE NO citation records in the database this example comes from. There is only a source record and an evidence record created from information that was extracted from the source. That's it. The citation is generated "on demand" from information in these two records using a standard template that looks for the proper tags, and then formats the values of those tags, in some specific order, maybe using quotes, maybe using italics, all determined by a template, into a string to add to reports or footnotes or bibliographies.

Tom Wetmore
GeneJ 2011-04-19T20:44:48-07:00
APG Public List: Research Objective, Mils response
http://mailman.modwest.com/pipermail/apgpubliclist/2011-April/003388.html
GeneJ 2011-04-19T20:46:20-07:00
See also, APG Public List, Research Objectives
Claire Bettag response.

http://mailman.modwest.com/pipermail/apgpubliclist/2011-April/003358.html
testuser42 2011-04-20T05:32:58-07:00
Good detailed examples, thank you!
Now we need to see how we can "map" all this into a BG file.
AdrianB38 2011-04-22T05:03:51-07:00
Interesting views there.

I have a slight concern that ESM's view that "analysis is a part of everything that transpires in the research process" might be used to challenge the fact that "Analyse results" doesn't occur until step 4 of my process. Surely, someone might say, if ESM is right, then analysis should appear in every step of my process?

Well, there are a couple of points that one can put forward but first perhaps you need to understand what processes are about.

A process is drawn up, with steps (i.e. sub-processes), in order to help us understand WHAT happens. The exact, dirty detail of HOW it happens is not a concern at this point. It is very common in drawing up processes to find that the work that a human being does is nothing like as compartmentalised as a literal reading of the process steps would imply. People do one thing and mentally zip forward to at least think about the next couple of steps, then get back on the formal track. We've all done it - we've been sat in the Archives, ploughing through the documents that our search plan says we should look for, when suddenly - ping - you get a match so perfect, that you say "That just has to be her!" That's leaping forward into the Analysis step. Then, being the careful researchers that we are, we revert back to the "Carry out Research" step to ensure that there aren't more matches.

This doesn't invalidate the process steps, it just shows that human being are better at parallel, non-sequential, processing than processes normally give them credit for.

So, I'd say that ESM would still work according to the steps described in my process - which, after all, isn't radically different from the well-known Genealogy Research process bubble chart. She simply zips forward to the Analysis stage, then back. And fortunately, I have a link back in step 5 to enable someone to go backwards, though perhaps I might review where those link-backs go to and from.

I should add that saying that process steps are not absolutely sequential is NOT always a good idea. If the process for acceptance of a new aircraft goes something like:
- design
- build
- test
- certify for passenger carriage
- trial in service
- use in service
then you most certainly can't zip forward to "trial in service" (and back) before you've got a certificate for passenger carriage.
AdrianB38 2011-04-22T05:10:32-07:00
I might add that I've been in cases where I've found evidence that I haven't even had a "focussed goal" for. E.g. sitting in Edinburgh archives reading up on my GG grandfather's bankruptcy, one sentence made the hairs on the back of my neck stand up for it told me where the artistic gene in our family comes from. That, of course, was not something I had ever thought to specifically look for.

It would be useful for me to test my finalised process against such moments of serendipity - I suspect that it would be absurd to bother with focussed goal, search plan, etc, for such a thing. The process - and the research type entities in the BG Data Model - do need to accommodate one minute discoveries.
gthorud 2011-04-22T10:12:13-07:00
Somewhat out of context, but the three things in Claire Bettag's responce - Goals, Objective and Plan, is already in the administrative requirements - called Project, Objective and Task (plan = possibly several tasks).
GeneJ 2011-04-22T11:17:55-07:00
@Adrian:

Great post.

I'm going to follow your original lead and write more about things I blogged about--but I'm going to try to include more examples in that process.

You wrote, "analysis should appear..."
I approach the evidence process closer to the way Mills described it. Can't find the thread right now, but I recall posting pretty early on that analysis is part of deciding which record group I'll look at first.

The "persona" :) and "evidence" concepts, as those were explained to me early on, are in conflict with my need for valuable, readily accessible information about what I know and the evidence I have discovered, blah, blah, blah--those are the resources I depend on for my continued analysis.

BUT ... I have to do a better job of explaining that. I know I am not the best from my community to represent a summary of how we set out to do what we do. Those who are have published the books, given the lectures, written the articles, posted the websites--the stuff been trying to post--I am only doing my best to represent them.

On to my page ...

P.S. Serendipity. Hopefully I'll do an admirable job of explaining it--my process depends on finding evidence that way.
AdrianB38 2011-04-22T12:58:02-07:00
Gene - Re when you say "The "persona" :) and "evidence" concepts, as those were explained to me early on, are in conflict with my need for valuable, readily accessible information" ... and go on to say "I have to do a better job of explaining that"

I think we, the enthusiasts for E&C (whatever we call it) need to do a better job of explaining to you that 95% of the time, you won't see any difference between an application doing its stuff in an E&C manner and one doing its stuff in a Conclusion-only manner. And that 5% is liable to be the time when you're saying - "Now did that occupation apply to the John Doe who was married or the John Doe from the census?" - in other words, the tricky bits for any method.

And that view, is only something I've started to hold after Tom described the way that nFS works - however imperfectly it does. Until then I'd imagined it might be a decision you, the inputter, made on a case by case basis. Nope. Too hard.
GeneJ 2011-04-22T14:14:31-07:00
@Adrian:

"enthusiasts for E&C (whatever we call it) need to do a better job of explaining..."

Hopefully I'll finish this before anyone works to hard on the 95% so that I can explain why anything shy of 100% is unusable for me.

"... Tricky bits for any method."

I have 100 percent now ... but I digress.
GeneJ 2011-05-15T09:51:37-07:00
Comment: Research process vs Search Process
Research process vs Search Process
GeneJ 2011-05-15T13:32:12-07:00

Where you have the element, "Current Conclusions" consider, "Current Conclusions/Body of Evidence/Body of Research."

Where you have the element, "Search Plan (detailed), consider "Research Plan (detailed)."

Where you have the first quad of elements, "Analysis of identity," "Analysis of information ..." consider relating those to the "Current Conclusions/Body of Evidence/Body of Research" in order to learn "Interim Conclusions" and to find "Conflicting Evidence." (By definition, perhaps same relates to the second quad of the same elements."

Where you have the element "Updated Conclusions" consider "Updated Conclusions/Updated Body of Evidence/Updated Body of Research."

You start with "create or revise search plan" with a directly line to "search plan," as opposed to beginning with "research plan" which may or may not include many "search plans." Mark Tucker seems to use the words research and search interchangeably, too, in his diagram.
AdrianB38 2011-05-16T04:25:29-07:00
"Current Conclusions" etc
Yes - makes sense - you'd use everything that you have and that might not be formally written up. (Though it might need to be a bit snappier than that on the diagram)

Search v Research etc
Yes - there is a difference isn't there? Research plan is not just searching but how you'd analyse, etc, etc. I think when I wrote Search Plan I actually meant that but Research Plan is better.

"Where you have the first quad of elements, "Analysis of identity," "Analysis of information ..." consider relating those to..."
OK - need to see how the words pan out on that one. The analysis stage might get a bit rambling if I'm not careful...
GeneJ 2011-05-16T11:25:01-07:00
Lookin' good. --GJ
gthorud 2011-05-17T18:50:12-07:00
Please define Body of research.
gthorud 2011-05-19T17:46:55-07:00
Comments from Geir on the whole page
Finally finished my comments.

AdrianB38 2011-05-20T03:20:59-07:00
Thanks Geir - will print it out, read and think through...
AdrianB38 2011-05-21T12:19:23-07:00
Quick thoughts based on an initial read:

- The level of detail of my current document is somewhat light in the descriptions of the process steps (or sub-processes) - thus outputs (documented assumptions in sub-process 2) may not be explicitly referred to in the description (as you highlight). If the whole process is to be robust, then I'll have to ensure these things are mentioned.

- As a result of the above, the document is liable to get longer so I'll probably have a high level, cut-down version at the front of the document, since I do not want people to immediately get lost in detail.

- Goal v. Objective v. Task. Err. Yes, I think this is liable to cause confusion. So far as I remember, "Goal" came from the Tom Jones handout, so when I wanted a word for the next level down, "objective" seemed the natural choice. This may be part from my background - Objectives had to be SMART
- Specific
- Measurable
- Agreed
- Realistic
- Timed (you may have variants on that mnemonic)
If we set aside Timed as being irrelevant to our hobby and Agreed as probably also irrelevant when there's only one person, it still seems to me that the top level "thing" might be sufficiently loose in its definition as to not merit being an objective. Whereas I wanted the next level down to be much more realistic, for one thing. Hence it seemed Goal and Objective worked. (And our Requirements Catalogue has 1 goal but a number of things on the level below.)

I'd not be happy with Objective and Task instead of Goal and Objective, as to me, a Task is something that is done (more or less), by 1 person, at 1 place, at 1 time. To give an indication of what levels I'm talking about:

Goal - Find the parents of X, born England 1850, married someone with first name Y.

Objective 1 - Find marriage certificate of X and Y (this gives X's father only)
Objective 2 - Find birth certificate for X

While the Tasks would be individual steps:
Task 1.1 List all candidate marriages from the index between X and Y in a period N-M, where M - birth of eldest known child plus 2 (say) and N goes back to when the 2 parties were of a legal age to marry;
Task 1.2 Choose most likely certificate and send off for it
Task 1.3 Given the details on the cert, can we prove the X on the cert is our X?

I do NOT imply that I'd want all the steps writing out. In fact - I think it very unlikely they would be.

Some of your other comments are important but I think they may apply to the physical design of the application and its reports, rather than a more abstract analysis of process and data - but it's necessary I think about them to see if there are any implications.
gthorud 2011-05-22T08:12:02-07:00
Adrian,
I think it is correct to develop the research process document so that we can understand the requirements wrt data, but when that is done we have to acknowledge that genealogist do not only look FOR or IN sources, and analyze, they also do other things like correspondence, doing interviews, publish something, take a photo etc. Those are all things that can be recorded as to-do /task/logs/subprojects in current genealogy software, because those concepts are more general. We cannot ignore current programs. I believe that when the research process is “finished”, the recorded structures should be fitted into a more general picture, preferably without losing its specificity – but there might be a need for compromises in order to reduce complexity.

I see that is a difference between your Objective and a Task – I should have put some more thought into what I wrote about Objective<->Task. You consider a task a smaller/detailed thing, and I consider it a more general thing. The terminology is probably best discussed after the concepts we want terms for are defined. Looking at the discussion of Admin01 there is one term that could be considered instead of Objective – a Search (fits with your examples of objectives, but is more specific)– and looking in a dictionary I also find Sub-goal (I have no problem with your term Goal).

Re. your last paragraph. It might seem you have some ideas about what should be an application issue only, and what should be a BG issue, and perhaps also what should be a process description issues only. If so, it would be useful to have principles should be clarified.

A new thing: The term Assertion is used in many contexts, eg Gentech, nFS. It would be useful if we could describe how that term relates to this, even if we do not use that term as an entity or whatever.
AdrianB38 2011-05-30T09:05:23-07:00
Update 30 May
I have started to recast the research process to take note of comments. The document is still relatively informal with missing definitions - please highlight what you want defining.

The flow of the process has been altered as when I tried to match this against some of my own research I found I was updating the database during the process, not just at the end. So I was looping round lower-level steps, updating the database, and then (potentially) updating the database at the end again.

Since I didn't want to write the update database stuff twice, I had to think how to rephrase it all and eventually realised that it made sense to portray the process as a sequence of lower level steps ("How do you eat an elephant? In small pieces..." Apologies to any vegetarian non-attenders of UK management training courses). The _last_ step delivers the overall goal, so it can all be written out in the same fashion.

I have altered the text and diagram to match.

I have also added some quite vague thoughts about entities, viz:
An output value of a person / group / place etc in the database (i.e. a PFACT) may be justified by one or more proofs
A proof may justify one or more values in the database
A proof may comprise a statement of conflicting evidence
A proof may comprise one or more conclusion statements
A proof may comprise a statement of analysis
A proof may use one or more items of evidence
A proof may be derived from a work portion
A work portion may give rise to a proof

An item of evidence may be found in a source (the citation - in the PAF sense - tells us where the evidence is to be found in the source).
An item of evidence may be found in a PFACT (i.e. a value of a person / group / place) etc recorded in database

As a consequence, note that the value of the item of evidence must be preserved somehow, even if the records for that real world person or real world source are updated in the database.

Compare to current GEDCOM
An output value of a person / group / place etc in the database (i.e. a PFACT) may be justified by one or more "citations"
A "citation" points to where to find something of note in the source.

In effect, it looks like - on a very swift reading and somewhat incomplete reading - that the proof entity in this model replaces the "citation" entity in the GEDCOM model. And can therefore sit alongside it when converting data from GEDCOM to BG.

Thus someone using software based on this data-model would see the proof and its components in place of the citation and its components. Those components would be directly equivalent to what is seen on "citations" now.

Any data based on "citations" only would be converted in some fashion so that it would be shown consistent with the new model.

In EFFECT I think I would see the same thing as now but with several extra boxes.

But I've not detailed that yet........
AdrianB38 2011-06-21T14:13:14-07:00
#1 - "I was convinced that there was one objective per Work portion. But reading the new definition of work portion it has only one objective". Yes - I seem to have firmed up on that, and there is (now) only the one objective - to aid simplicity and / or clarity.

However, my point still holds - a "work portion" and an "objective" are different, though linked, concepts. Note I am only talking about concepts here, not data entities. Specifically, the "work portion" has an objective, but also it has (off the top of my head) planned tasks, notes about progress, results, etc. (Items not meant to be rigorous). So the two concepts are different excepting only that one contains the other. If I talk about a data model then there is a 1 to 1 correspondence between work portion and work-portion-objective, which compels me to think that the objective would just be a text attribute of the work-portion. No argument about that.

#3 - WP and objective - if we talk data modelling then, as above, these are 1 to 1.

(Just as an aside - not for the first time I note that I haven't yet modelled work-portion etc in the data model. It just seems to be about the final stage. Not sure why I've not done it yet.)

WP and Proof - I _think_ that in the same way that a WP has 1 objective, for simplicity, then a WP gives rise to one proof (a proof argument or proof summary or whatever). I would envisage that the one proof could be written out - coming to several conclusions about different genealogical values. Splitting it into more than 1 proof would seem to create potential confusion with needs to cross-refer to other bits in the same WP.

So - 1 work-portion has exactly one objective and may have exactly one proof - each proof may come to 1 or more conclusions, etc.

4. Brain storming - need to think about these...

6. "the proof gives rise to conclusion statements, I read that as the statements are written after you have written the proof" - no - in effect, when you look at the written words of the proof, I mean that the conclusion statements are part of the proof words. ("Words", not entities).

"I wonder what the purpose of having analysis and conclusion statements as separate “somethings” would be?" In essence I designed them separately for 2 reasons:
- if the proof has one analysis text (well, two - one for identity and one for analysis), it can have multiple conclusions. Therefore, and with apologies to Tom, this is a data model and (a) I psychologically can't do anything other than denormalise multiple attributes and (b) even if I could overcome the psychology, the diagram wouldn't show up the multiple nature very well. Again - I have no problem with the idea of the physical implementation denormalising.

And the 2nd reason to separate them is to keep them physically separate for clarity. In my view, if we don't separate them and stick relationships on the separate bits (said without prejudice to the physical version of this) then we might just as well have a shared note to contain all this lot.

So yes - this _could_ enable an app to just list conclusions, hiding the proof that sits behind.

"Maybe the simplest thing is to have the analysis and conclusions in the same filed, and probably also the conflicting evidence" - but then, in my view, as stated above, we haven't progressed beyond a shared note.

"Regarding replacing the Citation (or rather Reference Note) entity with a proof entity" - I don't think that would be a good idea for compatibility reasons, as I said and as you point out. The 2 need to co-exist. But I haven't sorted out how that could look in an app or report.

"I think it is better to link to the proof from the Reference note entity" - I am not yet convinced that would be the level to link things (if linking were a good idea) because I am wholly unconvinced that proof and Reference note are (in whatever sense) equivalent. I suspect that proof and SOURCE-RECORD are in some sense equivalent. The Conclusion-statement and Reference-note are closer but the cardinality is all wrong. Maybe a Conclusion-statement is closer to a page within a Source - which GEDCOM doesn't model (we have "where within source", which is a pointer that points to nothing in a model!)
brianjd 2011-06-22T11:03:24-07:00
#1 - It seems to me that if work-portion has more than 1 objective then it is not the base level work-portion (WP), but a collection of work-portions within a master WP. I think that goes to an implementation and not a model, except to say a work-portion may contain other work-portions. But to me, it is not the way to go. Leave it as an atomic unit, and if needed add a work-portion group object to join WPs into a master object.

#3 - It would be my argument that there can be only one proof from a "work portion" (btw - I really hate that term). Any other proofs would necessarily be a subset of some other proof which encompasses them all.

#4 seems, to me, to go to implementation and doesn't really belong in a data model. Implementations should be left out wherever possible.

#6 - Just losing me on this. I get the reason for separating the proof and conclusion. But, at least the way I do it. I see a bunch of evidence, from this evidence I draw a conclusion and then write an argument (proof) supporting my conclusion. So the conclusion comes first and the proof is made up of words which explain *why* I think my conclusion is proper. I'm not sure what it means to have "conclusion statements that are part of the proof words".

I'm not sure how we will support pulling in proofs and conclusions from standard GEDCOM.

On the data model. It doesn't link the conflicting evidence statements to the genealogical value justified by the conclusion. It would seem vital to me to have these linked. Unless you want to live in a rose-colored lenses world. I, personally, want to see the conflicting evidence attached to some value that is the result of a conclusion. Just in case more and more conflicting evidence pops up. Perhaps the conclusion no longer is a valid in the face of new conflicting evidence.

Lastly what is analysis of identity-text and analysis of evidence text? Are you seriously thinking about splitting out the proof into some number of sub-objects? Should this not just be a simple text object? I hope we aren't over-thinking this. A conclusion may be as simple as "Joseph Smith, Jr died 27 Jun 1844", with a proof as simple as "death certificate #999 on file at the Town Clerk in Carthage, IL, USA ". Or it could be much more complicated, with links to Wikipedia and history texts and written recollections of witnesses. The assumption here being, that the Joseph Smith, Jr. being researched is the same one as the founder of the Mormon Church.
gthorud 2011-06-22T18:17:42-07:00
Interestingly, I just discovered that the Data model is really there, not just an empty page. Lets see where that brings me - will be back Thursday.
AdrianB38 2011-06-23T10:12:42-07:00
Data Model is on the main page, which is getting big, hence I'm intending to move it onto the (so far) empty page. Eventually.
AdrianB38 2011-06-23T13:50:57-07:00
Brian
#1 - agreed - and in fact your justification is probably more elegant than my thoughts so far.

#3 - I agree. (Btw - I'm not wild about the term "work portion" either, but it just fell out of my trying to explain what it was - it's just a useful sized chunk of work!)

#4 - tend to agree

#6 - a touch of "please see previous posts here". What really goes on is a lot of iterative stuff. Evidence / analysis / tentative conclusion / further analysis / conclusion unchanged but more certain / formal write-up of proof - all that sort of stuff, with a lot of loops round and round. Plus, of course, cases of "further analysis / oh heck that doesn't work any more".

I guess in an application, I would design it so that the "proof" item would include the first tentative analysis and the "conclusion" item would include a first tentative conclusion, then the previous "proof" text would be replaced by a clearer version that just supports the hitherto tentative conclusion and omits the blind alleys that I went up.

"conclusion statements that are part of the proof words" - I'll have to check the BCG stuff but I think their sample proof documents conclude with the conclusions statement. It's a case of "Is the proof the whole thing, including the conclusion? Or is it just the bit, ahem, proving the conclusion?" Guess I'll have to decide and write some definitions accordingly.

"I'm not sure how we will support pulling in proofs and conclusions from standard GEDCOM". We can't - those words are in all sorts of places. When they're there.

"It doesn't link the conflicting evidence statements to the genealogical value justified by the conclusion" - it does - you just have to go via the proof. But I have been toying with the idea of linking from value to proof (which is many:many) and then - when selecting the proof in the application, one could see a list of its conclusions and a list of its conflicts at pretty much the same time.

"what is analysis of identity-text and analysis of evidence text? Are you seriously thinking about splitting out the proof into some number of sub-objects?" Yes. This is me against American Genealogy!!

Taking my tongue out of my cheek and being more serious, I am, nonetheless, fed up with reading articles telling us how to do citations, etc, that, while well written in themselves, avoid any mention of "How do we know it's the right person?" Even your simple example only proves that SOMEONE named Joseph Smith Jr died on 27 Jun 1844. We need something else that says (e.g.) "There is only one person in the US census of 1840 named Joseph Smith" (well, that won't work - there are 932!). More likely, we quote newspapers mentioning the death of the founder as being around then, find the death certificate, note the matches to the founder and the lack of any other JS death certificate in the area and era, and then that's OK.

Or to put it into Tom's terms, "death certificate #999 on file at the Town Clerk in Carthage, IL, USA" is the proof of evidence for the death for a persona whose name is Joseph Smith, while the proof of identity is what is linked to the creation of a conclusion person that "merges" (non-destructively) the persona from the death cert to the previous conclusion person for the founder and creates that new conclusion person.

It might very well be over complicating things but I seriously want to stick the question "WHICH JOSEPH SMITH IS IT?" right in front of people's eyes and splitting it seems to do it. Aside from which, splitting it also works well with the persona / multi-level evidence / conclusion person tree.
gthorud 2011-06-24T17:08:16-07:00
Having finally discovered the data model, a few things in this discussion makes more sense to me.

But I still don’t think it is a good approach to discuss this unrelated to what could be “output” in various places in reports. If we don’t discuss it in a context a lot of the reasons why we want do things in a certain way are lost, only to have to be discussed later. So discussing only the data model does not make sense to me.


If we assume that the objective of a work portion is to find the answer to a small problem, that would at most involve a small number of pieces of evidence and result in a few genealogical values, and if we disregard how things should look in reports, I have trouble seeing that we need to split thing into analysis, conclusion and conflicting evidence – in most cases the whole thing will be a sentence or a paragraph – so it should not be much reading to figure out everything that is relevant for a genealogical value. I think one option should be to keep things simple. If the work portion does not result in genealogical values, or is something that is not about sources-analysis-conclusions, then a text field for the recording of progress/result of that work portion is also sufficient.

So even if we can split things logically into proof, conclusion and conflicting evidence – that fact is not a reason for splitting them in some way in an implementation. So what would the reason for splitting be?

Adrian writes about “clarity” as a reason. Does that mean that we have to identify the pieces, so the user can understand which pieces of text is the argument, conclusion and conflicting evidence? Should that not be rather easy to understand form the text itself, even if merged in one field? It is unlikely that you will present the three as individual parts as such in a report.

Or, are we trying to teach the user how to document genealogical research – “You always have to write an argument, a conclusion and mention any conflicting evidence” – so the user interface will tell you that by providing suitable fields? You don’t need separate fields to tell the user that.


In any case, considering that there are existing programs with tasks/to-do’s etc. that all have in common that they only have one text field for recording of the analysis, conclusion, result or whatever it is called. If we split into proof/conclusion/conflicting, we would have to have a field in parallel that can contain it all if you want to be backwards compatible with the data in existing programs. (I am not talking Gedcom here, but data that do exist, for which you must provide an upgrade path.)


Re. linking from a reference note to proof/conclusion. Having seen the data model, I agree that it if any linking is to occur, it would be better to link to the conclusion rather than the proof (analysis). But assuming that a reference note will also contain some analysis, I don’t think you want to repeat the analysis in all citations that may result from a work item. If you were to reference a conclusion+analysis etc for inclusion in a reference note, it would have to be in the simple cases where you only produce one genealogical value with one conclusion and one analysis -and there might be a lot of these simple cases. If there are several conclusions, many values, and a more complex analysis – you would probably write the reference note from scratch – possibly with some assistance from the program that could make it easy to use some info from the research process entities.


Re. Brian’s last comments.

#1 – I don’t think anyone has suggested to have several objectives per work portion?

#3 - Having understood what a proof is (I hope), I agree that it is sufficient with one proof per WP.

#6 (and reading Adrian’s last posting):

(Looking at the data model, I am also confused by the “conclusion statements being part of the proof words”.)

Pulling in proofs from Gedcom. That was only relevant if you wanted to create proof/analysis/conflicting from the citation info in Gedcom, see first posting on this topic, a thing that I at least don’t think is realistic.

Re. Linking conflicting to genealogical values. Does the conflicting evidence apply to all conclusions/values (as is the only possibility if you go via the proof)?

Splitting the proof into sub-objects. It seems to me that the whole thing is getting far too complex, I am moving in the direction of wanting to have two field giving the proof/conclusion/conflicting/etc for a work portion/objective – one of these that MIGHT in some simple cases be used for output and one for the researchers private use. We can also link to the genealogical values resulting from the work portion.

If we want to have entities documenting the logic with “multipart proof”/conclusions/conflicting evidence (possibly per conclusion/gen value) that should be an optional set of entities, and it should be possible to convert the info in these entities into one field, following some defined rules, if the importing application does not support the split into a lot of entities. If you can do that, it does not really matter if the logic gets complicated in terms of entities and relations, because it will be a programs choice to implement it or not.
brianjd 2011-06-24T21:27:50-07:00
I tend to be with Geir on this. I think way to much normalization is going on, making things more and more complex. This model is going down the path of the gentech model. Soon we'll have so many different kinds of records it will become much too much to deal with even thinking about wanting to code. If things go on this way.

The simplest solution is usually correct. Let's focus on building a bare skeletal model of everything, and then later divide out things that need it.

I will likely rarely use the proof conclusion pieces. When I do it's for the few individuals that have substantial conflicting evidence. I deal with this evidence in two ways, by creating multiple events for a person, and set one as the default. Or I make notes. Having evidence persons would go a long way streamlining this process. I don't make long drawn out proofs. I use evidence and rank the evidence to do this for me. When conflicting evidence exists, then an explanation is required. But most of my "proofs" are proven with the data. Or in a brief text comment on why I believe the way I do.

Conflicting evidence is merely entered with all the other evidence and linked to the person or event to which it belongs. Each piece of Evidence could have it's own "proof" field, then proofs are simply built from the evidence by merging it all into a display. Then the conclusion person would need a conclusion field.

Then the need for all this multipart/multilevel/sub-object business becomes moot. Put the arguments and conclusions where they belong and life gets a lot simpler, for the end-user. It becomes a merge issue for the coder. But that's simple enough.

Normalization may seem like a great idea for analysis purposes, but can take away key perspective in interaction information and leave one out of touch with the actual natural data flow. Normalization is also a good thing for relational databases, but that should not be the goal of a standard to dictate.

We should think in terms of workflow. Will this work better for data entry by normalizing or not? How would it be best to enter this data? Where does naturally fall in data entry? These are the kinds of questions that should be shaping the model. Not a technical viewpoint. Sure it's much harder to think in the more abstract, but will lead to a model that is naturally user friendly, and naturally codeable in a user friendly way. It's win win. Or maybe I'm off my nut.
AdrianB38 2011-06-25T08:00:16-07:00
I need to think through these issues a bit more before committing a full response but a couple of themes:

1. "I have trouble seeing that we need to split things into analysis, conclusion and conflicting evidence" and similar comments. My issue with not splitting it, is that if we don't, then effectively we have just a shared note and we haven't advanced the study of genealogy. A classic complaint is that people mix up person X with their parent of the same name. My hope - possibly forlorn - is that by presenting the user with a box labelled something like "proof of identity", then some of them might, just might, think - "Yes, how _do_ I know it's the X who did ...?"

So yes, Geir, I am trying to teach the user how to document genealogical research and if I don't highlight the bits by showing separate fields, it just won't work for many of them.

2. "I am also confused by the conclusion statements being part of the proof words."

OK - let me try again in XML:
<TotalProofWords id=9999>
...<Objective> To xxx xxx xxx </Objective>
...<AnalysisOfIdentity> text text text </AnalysisOfIdentity>
...<AnalysisOfEvidence> text text text </AnalysisOfEvidence>
...<SomeOtherAttribute> xxxx </SomeOtherAttribute>
...<ConflictingEvidence id=999> text text text </ConflictingEvidence>
...<ConflictingEvidence id=999> text text text </ConflictingEvidence>
...<ConflictingEvidence id=999> text text text </ConflictingEvidence>
...<ConclusionStatement id=999> text text text </ConclusionStatement>
...<ConclusionStatement id=999> text text text </ConclusionStatement>
...<ConclusionStatement id=999> text text text </ConclusionStatement>
</TotalProofWords>

Print routine would print all that lot.

3. "if the importing application does not support the split into a lot of entities" - I am seriously worried by this idea. I can cope with the idea that an application might omit a big chunk of BG such as multi-level evidence and conclusion people - but that has to be a big, self contained, officially notified and probably officially sanctioned chunk, allowed solely to get BG out on the streets and in the code. If we attempt to design for relatively minor variations - which is what I think this is - then we are wasting our time and designing for failure. These guys either support BG - in one of a very small number of variants - or they don't. (In reality, they'll ignore bits without our help...)

4. Normalisation vs. Coding - sorry, I can only repeat - I am not designing the data model for a physical implementation. I am analysing what is there. The question of the flow of data is hugely important and it's not one that is answered by a data model. You're not off your nut, Brian, not at all - you're just asking a different question.
gthorud 2011-06-25T19:08:02-07:00
Just a point of clarification. I think I have learned during the work on BG that it is unlikely that it will be the current contributors to BG that will decide what gets implemented in programs. I see our main role at the moment to describe possible extensions to Gedcom, and if we were to agree on every possible extension I am afraid that very little would be produced. I therefore think we should allow ideas to be developed, described and discussed. If we are able to design functionality in such a way so that implementations not choosing to implement a certain feature, can interwork with those who have implemented it, that is an added bonus.
louiskessler 2011-06-25T20:35:00-07:00
Geir,

To me, what seems to be going on at BetterGEDCOM is very abstract.

BetterGEDCOM already has ideas galore embedded within the many discussions. We're not going to agree, no matter how many months or years everything gets discussed. And the reason is that there is no one way to do anything.

I'd like to see much more concrete work done. Something that might lead to a specification that BetterGEDCOM can recommend to the Genealogical community.

Louis
igoddard 2011-10-01T08:37:58-07:00
"One question. How can you have a proof before you've made either a hypothesis or a conclusion. A proof proves a conclusion. A conclusion doesn't follow from a proof. Unless there is first a hypothesis.

I would argue a hypothesis is pointless in this subject area, as you don't want to prejudice the interpretation of the evidence. You want to gather the evidence and draw a conclusion and then support that conclusion with a proof.

I wouldn't even begin to know how to do a proof without first having something to prove. Evidence doesn't need proving, but that's the way the chart reads to me."

Looking at this from the PoV of having been a working scientist, acceptance of any hypothesis is conditional on failure to disprove it. A test of a hypothesis is one which has the ability to disprove it. The whole of science is a collection of not-yet-disproven hypotheses.

From a genealogical point of view this means that I'll start with some record, form a view as to what that means for my family history and then look for more stuff which has the potential to disprove that view. As more stuff comes in either the view is disproven or else its supported. My confidence depends on the number of things I've found which could have disproved the hypothesis but failed to do so. (Actually this only true of a small amount of the way I spend my time; most time these days seems to be spent getting dragged into other peoples' puzzles ;)
ttwetmore 2011-10-04T05:51:14-07:00
In genealogy most hypotheses are of the form "the person mentioned in this item of evidence is the same as the person mentioned in that item of evidence."

In most desktop genealogical applications the two person records for these persons would get merged, and the reason why they were merged (the "proof statement" or maybe "hypothesis statement") is never written. At best, the sources for the two original persons would be maintained in the merged person. Anyone coming afterwards would have no clue as to how or why information from the different sources of data were combined into this person.

In the DeadEnds model, and others, no combining is done because a third "hypothesis" person is introduced, though this person is more normally called a "conclusion" person. The hypothesis person refers to the two evidence persons which are left unchanged by this process. In full n-tiered approaches, multiple evidence persons can be joined by hypothesis persons, and higher level hypothesis persons can be built from lower level hypothesis persons. The DeadEnds model allows n-tiers, and others models do also.

Please continue the thought process one step further. In the evidence persons there are source references to the evidence where you extracted the evidence persons from. What would you put in the hypothesis person instead of source references? I hope it's obvious. You need a hypothesis statement or a proof statement, or whatever you choose to call it, a statement that justifies your belief in joining the two evidence persons into a hypothesis person.

In the past I have said that such a conclusion is "just another source," or at least it is totally analogous to a source, that source being my brain and my thought processes, but so far this has been too much for the brains of Better GEDCOM to comprehend, or to be gentler, maybe to accept. Regardless, where we have source references in evidence based data, we need hypothesis references in hypothesis based data.

Summary: a hypothesis/conclusion person is the digital manifestation of a hypothesis in a genealogical database. The source reference in the person refers to the "proof/hypothesis" statement that justifies the existence of the person.
brianjd 2011-06-08T19:38:55-07:00
One question. How can you have a proof before you've made either a hypothesis or a conclusion. A proof proves a conclusion. A conclusion doesn't follow from a proof. Unless there is first a hypothesis.

I would argue a hypothesis is pointless in this subject area, as you don't want to prejudice the interpretation of the evidence. You want to gather the evidence and draw a conclusion and then support that conclusion with a proof.

I wouldn't even begin to know how to do a proof without first having something to prove. Evidence doesn't need proving, but that's the way the chart reads to me.

Personally, I think it's getting too complex.
I have a *goal* I want to complete, I have *sources* I'm going to *search* through, and maybe some other *tasks* to perform along the way, as a part of my *plan*, *logging* the *results* of those searches and the *repository* where they are found. If I really enjoyed writing all day log I might include a description of the sources, but that's pretty obsessive if you ask me. When all the searches/tasks are *complete* I will draw a *conclusion*, which includes the possibility of a lack there of, and give my *proof* of my conclusion with *arguments*, noting any *conflicting evidence*.

I don't do my research this detailed, but I would if doing it for a living.

So what I have here is:
1) A GOAL, which includes
a) a PLAN composed of
i) SEARCHES which have
a) SOURCES
1) at REPOSITORIES
2) with DESCRIPTIONS and
3) RESULTS and
4) COMPLETIONS and
5) CONFIDENCE
ii) TASKS
1) potentially at REPOSITORIES
2) with DESCRIPTIONS and
3) RESULTS and
4) COMPLETIONS and
5) potentially CONFIDENCE
b) A CONCLUSION which is composed of
i) a PROOF with
1) ARGUMENTS
2) CONFLICTS to my proof.

You'll note I have not need for a separate *log* as it is self logging, although there is no reason why a goal couldn't link to the results, completion states and conclusion to display a log of where the goal is at. But this would be a programmatic thing, even if demanded by the standard. Or redundant and I hate redundant things.

I don't like your model as it makes it all look independent of each other, but linked, and it is really one thing encapsulated within another. A plan is meaningless without a goal, nor is a task meaningful without a plan.
brianjd 2011-06-08T19:40:42-07:00
It sure would be nice if the formatting I tried to give my comment was maintained in the posting.
brianjd 2011-06-08T19:43:26-07:00
1) A GOAL, which includes
--a) a PLAN composed of
----i) SEARCHES which have
------a) SOURCES
--------1) at REPOSITORIES
--------2) with DESCRIPTIONS and
--------3) RESULTS and
--------4) COMPLETIONS and
--------5) CONFIDENCE
---ii) TASKS
-----1) potentially at REPOSITORIES
-----2) with DESCRIPTIONS and
-----3) RESULTS and
-----4) COMPLETIONS and
-----5) potentially CONFIDENCE
--b) A CONCLUSION which is composed of
----i) a PROOF with
-----1) ARGUMENTS
-----2) CONFLICTS to my proof.
AdrianB38 2011-06-09T13:28:44-07:00
Brian, Thanks for responding.

"How can you have a proof before you've made either a hypothesis or a conclusion?" Well, we're both agreed that a hypothesis is not a good idea (we might very well have an idea lurking in our mind but that's probably a personal thing, not necessarily to be recommended). But I'm curious when you say "I wouldn't even begin to know how to do a proof without first having something to prove." I think we may be taking a subtly different definition of the problem in hand. If this were a mathematical proof, then yes, you start with the "end" statement (that appears after some tentative work) and come up with logic to prove it - QED. But that's not what we're doing here. This is problem solving, not proving the truth of an existing statement. It's like calculating the elevation of a gun required for a shell to hit the target. The "working out" that we end up with, is what I'm referring to as a proof.

You may want to say that "proof" is not a good word for what we end up with and I wouldn't argue but I tried phrases like "show your working" and got nowhere - "proof argument" and "proof summary" came from the BCG site and seemed to be acceptable.

I don't think there's a dramatic difference between your process and mine - except of course, I've taken a lot more words to write it out because (a) that's just me and (b) I wanted it to be clearer what the concepts were about. And I've had others say they need _more_ definitions...

Where I would take issue with you is when you say "When all the searches/tasks are *complete* I will draw a *conclusion*, which includes the possibility of a lack there of, and give my *proof* of my conclusion with *arguments*". Written out in that order, to my pedantic mind, it suggests the conclusion comes first, then you back it up with proof and arguments. In reality of course, you're linking the data together (analysing it all) as you're going along and out of those analyses comes the conclusion. I have to write the analyses down otherwise I forget - and those analyses form the draft proof and arguments, which don't come from nowhere.

"You'll note I have not need for a separate *log* as it is self logging" - if everything appears elsewhere, then no, there wouldn't be a need for a separate log but I'd not got that detailed yet and in any case I was thinking that negative results (for one reason or another) would go into the log and be stored there.

"I don't like your model as it makes it all look independent of each other, but linked, and it is really one thing encapsulated within another. A plan is meaningless without a goal, nor is a task meaningful without a plan." Absolutely agree with the last sentence. Trouble is - I don't see how my process description or flow chart can be interpreted in any other way than creating the goal first, then creating the plan with its tasks - e.g. "split the necessary work into portions, each of which contributes a step towards the overall goal". And in the flow chart, the goal comes first, then the plan (with its tasks). So we're in agreement about the underlying structure and I thought I'd got process and chart to match that structure.
brianjd 2011-06-09T15:36:41-07:00
I can see your point there on proof and arguments. It is a kind of sticky thing because you have the goal of who are the parents of x. You get your first piece of evidence that indicates J may be X's father. So you begin to form a conclusion at that point. So you have evidence that implies J, and your argument that J is X's father is because of evidence E1. So which came first the argument or the conclusion? Hard to say, some would probably put it one way and others another.

On the logging, the result of the search is part of the logging I'm speaking of. A negative result gets reported as well (ex: Search; 1750-1798 St. Mary's Church, Cheshire Baptisms; subject William Arden; Result: Not found).

On the flow chart, I don't have any issues with that. It was the data model you did at the bottom I don't like. But, it's probably just me being pedantic.

I will say I think either way the conclusion statement and conflicting evidence statement should be merged as a single argument statement object. A "conflicting evidence statement" has to be part of the proof, hence it is part of the argument.

Yes, overall I think we are in agreement. I'm just applying scissors to try to reduce the complexity. I want to simplify my life not complicate it.

When you find yourself writing an 800 page book to define a citation methodology you might just be overthinking what should be a simple thing. So, I like simple things, which are hard to agree on when everyone has a say. Feature bloat.
AdrianB38 2011-06-10T05:22:31-07:00
Brian - yes, I reckon we are in agreement. Me being pedantic, I'd always call it a "potential conclusion" when "you begin to form a conclusion at that point." But in data terms, a "potential conclusion" entity type would presumably look identical to the "conclusion" entity type.

I'm sympathetic to the idea of simplifying that data model - for one thing, as I now realise you were indicating, it's very hard to gain a view of what is subsidiary to what in a data model. I automatically cut off any attempt in my mind to get that idea, but others might not. I think I did have conclusion statement and conflicting evidence statement originally as attributes of the proof entity, but as multi-valued attributes (i.e. several conclusions or conflicts per proof) I split them out (I can't stop normalising....) I might review that, particularly as I'd really like to get the right and left hand sides of that data model matching as much as possible to ease the introduction of this sort of stuff into the software.
brianjd 2011-06-10T06:48:47-07:00
Ancient Chinese Saying - The real secret in standards and development is to know when to stop normalizing and when to de-normalize a little.

Few have ever achieved mastery of this Ancient and Mysterious Martial Art. My Normalization Kung Fu is always better on other people's models than my own. ;')
ttwetmore 2011-06-10T07:50:05-07:00
When I make a data model I never normalize. I believe the best approach is to model every major noun concept as an entity, with structured attributes, define relationships between entities that also correspond with real wold relationships, and then model all "operations" (verb-like concpets) on the model as functions/methods that take in entities of the modeled types and generate entities of the modeled types.

Later, if an implementor wishes to represent the data model using relational tables, then the implementor can decide exactly what level of normalization to use in what parts of the model to optimize whatever searching or operations they wish to support.

The main purpose of normalization is to convert 1:n and n:m relationships between entity types into 1:1 relationships. This requires the creation of many tables that don't make common sense. (Thus all the confusion with the GenTech model.) The main reason for this is to make SQL queries easy to write and efficient to execute. But we are now in an era where relational databases are no longer required for efficient queries against data. In fact, more and more databases are becoming networks of records that are expressed externally as XML entities. This is causing a resurgence in what were called network databases, and many modern indexing techniques are now used to make querying and operating on network databases just as efficient as on relational databases.

The real advantage of the network approach, in my opinion, is the ability to take data model with common sense entities, and implement them directly in databases with records based directly on the same common sense entities.

I want my database to have a Person and a Source record with Person records allowed to refer to many Source records. If I had to normalize this model I would have to create a new PersonSource record to flatten out the relationships. I don't want to have to put these records into my model, and I sure don't want to have to explain what they mean! Take a look at the Assertion record in this context.
gthorud 2011-06-17T15:54:26-07:00
A few comments on the page and the postings above.

Merging the “objective” with “work portion” should make it easy to fit the research process and what is described in the Req Cat’s Administration requirements, but with different terminology – which is no problem. As I see it, Task=Work portion and Objective=Goal.

I assume this will give rise to a Goal Record and a Work portion record.

But reading Adrian’s first posting above, there seems to be a conflict wrt “What is it that a work portion is supposed to represent?” Reading the page, I get the impression that it is to find some info in one or more sources, but reading the posting it is the proof of one or more values in the database (one proof per work item). These are two very different things – they must be separate.

Considering my comments (in the pdf doc I posted here http://bettergedcom.wikispaces.com/message/view/Research+Process%2C+Evidence+%26+GPS/39316654 ) about having to type text twice, one in the research process records and one in the records that output in reports, I think it will be difficult to avoid typing two times primarily because different users wants to do it in different ways – some wants to put proofs in reference notes and others in notes for e.g a person. But, what might perhaps be possible is that an application can make it easy – based on preferences set by the user – to copy info from Research process structures to “Output structures” and probably some linking. If a work portion were to support one proof, and that proof would be output in a reference note, that will only support one way to do things.

About Conclusion statements and Statements of analysis : What is a conclusion statement? Is it the intention to have many conclusion and analysis fields/records holding these statements, and conflicting evidence? If so, is that really necessary? Is it not sufficient to have two fields, one holding text that could be copied by the program into some place where it would be output in reports and another field for text that will not be published?

What do we mean by “value” – are the date/place/value/(note)/role of an event a “value”? If so, can we have reference notes for each date/place/value/(note)/role. It may be difficult to reference some of these if we have several dates/places per event and several persons with the same role type. This is perhaps an issue that could be discussed separately.

Independent of the granularity of values, “One proof may justify several values” could translate into one reference note could be referenced by several values – cf discussion on my doc about E&C.

I have more comments on these postings – later.
AdrianB38 2011-06-19T08:58:39-07:00
Geir - thanks for responding. Maybe if I take things a bit at a time, I might get something done...

1. Re "Merging the 'objective' with 'work portion' " - I'm not sure what you mean by that. The work-portion is a piece of work that gets planned and (with luck) done. The work-portion has one or more objectives, but it might also have other properties, such as description(s) of progress (or links to such), so there's more to a work-portion than just an objective. And you seem to split them yourself later when suggesting "a Goal Record and a Work portion record".

2. "As I see it, Task=Work portion and Objective=Goal" Very possibly, though it depends on precise definitions - which I haven't yet got. On a personal basis I tend to think of a Task as something done by 1 person in 1 place at 1 time - whereas a Work-portion is probably bigger in scope than that. But let me get some proper definitions in and I suspect we may not be far apart.

3. "What is it that a work portion is supposed to represent?" It's a conveniently - and perhaps arbitrarily - sized piece of work, with a defined (or definable?) beginning and end that contributes to the overall focussed goal. It may start with some research - or maybe not. It may well give rise to a proof, but I guess it may not. E.g. the first task may well be "Educate self about where to find records of Royal Navy in Nelson's era". That won't give rise to any proof. I need to add that caveat in.

4. "different users wants to [record proofs] in different ways – some want to put proofs in reference notes and others in notes for e.g a person" Also "If a work portion were to support one proof, and that proof would be output in a reference note". I'm very far from saying where these things should go. Right now I only want to model the concepts and their home in a physical model doesn't concern me. Besides, I reckon many of those saying proofs should go here or there are simply saying it in terms of what is known now. Where they go if we extend the model and - more to the point - where it is accessed from or printed out to, are further questions.

But I can say that I am totally of the anti-duplicate school and if I were designing the application, I'd store the proof once and have an optional cross reference so that when it comes to printing, the right stuff appears where it's needed.

5. "What is a conclusion statement? Is it the intention to have many conclusion and analysis fields/records holding these statements, and conflicting evidence?" That's what I'm not sure of yet. The way I drew the data model, the conflicting evidence is separate and a proof gives rise to several conclusion statements and (potentially) several conflicting evidence statements. But I can easily contradict myself totally by thinking ahead to the physical storage of this data, at which point I start denormalising to cram everything into X level-Y GEDCOM records / XML / JSON, etc.

6. "What do we mean by 'value' – are the date/place/value/(note)/role of an event a 'value'?" Yes. Ditto an attribute / relationship, etc.
Re notes per value - "This is perhaps an issue that could be discussed separately" Yes. Please!

7. " 'One proof may justify several values' could translate into one reference note could be referenced by several values". A reference note, as I said, is about the physical implementation. Ditto which end of the relationship that you put pointers on - or, to put it another way "If A is related to B, does A point to B or does B point to A?" Don't know and at this stage, don't care. That's physical implementation.

Now, assuming Wikispaces doesn't log me out when I press <post>, I shall try to tweak text and model to cope with the above comments.
gthorud 2011-06-21T12:42:28-07:00
Some of the following may be too directed towards “implementation” so it may be that it is more directed towards the next phase, but I can’t help think in terms of implementation.

1. Sorry, I must at some stage have misread the page, I was convinced that there was one objective per Work portion. But reading the new definition of work portion it has only one objective.

2. Agree on your view of task 1-1-1. Seems to me that a work portions would need subrecords for Searches/Other small peaces of work.

3. Well, I hope I understand what a work portion is, but I guess my problem was about the one to one relation between WP and Proof, and now also between WP and objective. Pending definitions of objective and proof, it is difficult say if that is correct, but my bone marrow asks if there could be more than one proof resulting from an objective. Or are we saying that the proof is the one answer to/result of the objective?

4. Some brainstorming on output follows.

One aspect of “different users wants to output proof (or whatever) in different places” is that if you store the proof somewhere in a Research process structure, the receiver of a BG file could select where the proof should be output, making it consistent with the receivers preferences for all info – also info not in the BG file but output in a report. But this is in theory, I am not sure about practice, one obvious problem is that a proof could be worded differently depending on where it is supposed to end up in output .

Two methods for reference comes to mind, one being relations in a data model, the other is “keywords” (many programs use something like [PROOF#1234]) inside fields (incl. notes), where [PROOF#1234] would be substituted by the text of the proof.

Even if it should turn out to be a problem to be able to support several ways to output the text by reference, supporting one way (two) is better than none. Two ways that might be possible is to use “proof notes” (cf Hannah Preston example from Gene) or putting the whole proof in a reference note. My guess, is that the proof text could be written the same way in both cases, just separated from the source references when using proof notes.
Also, if you want to “rephrase” the proof when output somewhere else, a program could allow the user to copy the proof to that place and then edit it – you could even hav a non-printing variant of [PROOF#1234] somewhere in the resulting text as a reminder that the “semantics” of the proof is used in that text, so if you change the proof, you would know the other places that depends on the proof.

Well, as I said … brainstorming.

6. Looking at the flow diagram I understand that the analysis and conclusion statements are part of the proof. (I am a bit confused when you write that the proof gives rise to conclusion statements, I read that as the statements are written after you have written the proof.) I wonder what the purpose of having analysis and conclusion statements as separate “somethings” would be? Is it because different items of analysis follows from specific pieces of evidence? (Or could several items of analysis end up in one (or more) conclusion statements?) Or do we expect analysis to be output in a place separate from conclusions, our would conclusions – and not analysis - be output somewhere? (Or would different conclusion statements be output in different places?) Maybe the simplest thing is to have the analysis and conclusions in the same filed, and probably also the conflicting evidence.

Regarding replacing the Citaion (or rather Reference Note) entity with a proof entity:

If the proof is split into analysis and conclusion, one problem with importing Gedcom to BG would be that the Note in the Citation of Gedcom (which I think is often used to carry the analysis/conclusion/proof statement) is not split into analysis and conclusion. Thus you would have to have an alternative representation of these, where they can be in the same field. It wouldl depend on how the components of the split proof are used, e.g. functionality and how the components relate to other entities, if this “backwards compatibility field” causes problems.

Also, it would limit the way that proofs could be written to always fit in a citation.

I think it is better to link to the proof from the Reference note entity, and if the ”proof” text in that entity is not present, use the one in the Proof entity. If the proof<->ref note link is present, use the links from the proof to its Lookup (source). If the link is present, and the ref note contains proof text, use the proof text rather than that in the proof entity. This will also simplify the import process into a program not supporting the research process records internally.
ttwetmore 2011-06-16T17:20:01-07:00
Matching Personas Algorithmically
I'm not sure exactly what page to post this to, but I think this one is a good match.

I have often advocated techniques of algorithmically matching evidence records together as part of the research process to help decide which evidence records we have collected refer to which real persons. The techniques fit into step 4 of the research process as described on this page (Select & Analyze the Evidence). I have also described my experience working in this area over five years of employment, where I worte algorithms that combined billions of evidence records about persons, down to a much smaller set of real person records.

Other than to say that my interest in the "evidence person" concept, aka the "persona" record, is due, at least in part, to the fact that these evidence-based records are prerequisites (the "input") for these algorithms. And my long held belief that only by using these algorithms can we address the problems of solving the "family tree of humankind."

In previous posts I have alluded to the field of "nominal record linking," that has a long and noble history, extending back into pre-computer times. This area has generated much theoretical work that has generated probabilistic models and processes for carrying out record matching for finding records that refer with computable high likelihood to the same persons, and in followon pedigree construction.

Here in the Better GEDCOM forum my ideas along these lines have generally run up against skepticism, and I haven't been very convincing of the value of these algorithms. And since some of my concerns about our data model revolve around the needs for supporting these processes, this causes some to feel a lack of legitimacy to those models.

For anyone interested in reviewing the field of record linking, I think you will find this document interesting.

http://bartonstreet.com/deadends/RecordLinking.pdf

This is the Proceedings of the Workshop on Exact Matching Methodologies, held in Arlington, Virginia, in May, 1985, a very long time ago as the computer flies. A wonderful thing about these proceedings is that the first series of papers are historical, from the 60s and 70s, that dealt with the probabilistic theories and models that justify the techniques. The other papers bring the reader "up to date" as of 1985. Of course things are very dated in these papers. There is stuff written about how to sort records on magnetic tapes so that matching can be done by reading data from two magnetic tapes with two sets of records so that the the records to be matched can be found in serial order. Of course, with databases of today we no longer have to worry about such things. What is of lasting value, though, is the theoretical work that demonstrates how the properties (PFACTS in our parlance) being matched, and the values of those properties, lead to weighting factors that can be used to provide probabilities on the correctness of matches.

This is a long file, about 14M. I have also separated the overall PDF into separate PDFs for each paper. If anyone would like any of those separately you can let me know and I will email it to you.

Tom Wetmore
AdrianB38 2011-06-17T07:12:11-07:00
Thanks for the link Tom - and that's not meant in an ironic fashion. I'm afraid I never had much liking for the study of statistics on my maths course, so there's no innate understanding of the papers on my part that makes me go "Ah - of course!" Nonetheless, some simple things in the 1st 2 papers (the only ones I've attempted to look at (not read properly!) so far) stick out at me.

Firstly it implies that if you get 2 mentions of couples of the same name, then the probability that the 2 couples are the same couple differs depending on the rarity of the names involved. So if it's John Smith and Mary Jones in London, it means probably very little, but if it's Theophilus P Wildebeest and Tamar Pleass in a small village in Somerset, it's quite meaningful. I reckon we would all think that, but there's clearly a spectrum that would be worth looking at and understanding if we could. Theophilus P Wildebeest and Tamar Pleass in London? Theophilus P Wildebeest and Mary Jones in Somerset? Somewhere there's some lessons to be learned about coverage. And, of course, getting something that's statistically likely doesn't mean it's true. My Salter families in Bristol used a couple of odd first names frequently, one being "Thomas Sosthenes". But because they repeated such a name, father after son, etc, the full name is not diagnostic when taken _on_its_own_. Still - if there were a means to analyse frequency of names, it might reassure some and tell others that they need more evidence.

Secondly, the 1st two papers were talking about scenarios where the raw data had a good chance of being there. This might very well, therefore, work well with England & Wales post civil registration (1837) and its censuses - my scepticism kicks in when going pre-census and pre-civil registration, when we're dependent on optional church records. Ideally, the pro-matchers should be able to analyse the population numbers vs. church record numbers and say things like "It's possible in England in the 1780s" (say) "but not possible in the 1650s because of the breakdown in church recording during the Civil War and Commonwealth".

So - interesting in alluding to some of the possibilities and I hope in being able to analyse the limitations as well.
ttwetmore 2011-06-17T09:30:19-07:00
Adrian,

All good points. In the work I've done combining records we did use the frequency statistics of first names and surnames from the U. S. census as part of the matching criteria.

I find the articles difficult to understand as well. Some I've had to work through a number of times, getting a few paragraphs further each time. I'm now writing some genealogical combination software as a demonstration for an organizations, and I need to understand the probabilities so I can get the right weighting factors. Thus I have a stack of these papers now on my night stand for pleasant late night reading!

One thing about most of the articles is that they assume merging two different files, where each file has one kind of record. Going along with this assumption is the idea that each file can only have the same person once (i.e., there wouldn't be two birth registrations of the same person). They are merging two kinds of records from two different sources, and each person mentioned in each kind of record is only mentioned once there.

In "our" case we are merging (I much prefer calling it joining or combining because we don't want to loose the data) records that come from all kinds of sources, and we can't make any assumptions about how many times each real person is mentioned in the records. The facts that we have so many kinds of records, and that those records, even if of the same kind, can be so different in what attributes are available or not, put a lot of strain on some of the assumptions in most of the papers.

So it's hard to know how to put a boundary between purely ad hoc combining code versus trying figure out how do do things with computable probabilistic likelihood thresholds that really make sense.

A reason I keep bringing this topic up is to keep the important (in my humble opinion) concept of the "persona" record under Better GEDCOM scrutiny. I believe the persona record is the key addition needed to genealogical data models to enable support for the full evidence and conclusion (aka the research or historical) process. Better GEDCOM seems split between some who feel that a richer citation scheme is the key for the "evidence transition," and those that feel the persona record and concomitant multi-tier person structure is the key. I am happy to see that Better GEDCOM seems receptive to the idea of persona records.

I hope it is clear that the persona concept is becoming ever more important to us as genealogists. We are now in an on-line genealogical era where we can cheaply (often freely) search on-line databases for genealogical records. More and more of those records we can now load into our own computers and into our personal genealogical software. Those records are exactly persona records, exactly codified evidence, exactly the nominal records of the proceedings papers, though the programs of today force us to treat them as new, full person records that we then have to consequently and destructively merge into existing persons. Once we have made the transition into the evidence paradigm, however, these records will stay in our databases as exactly what they are, evidence-based persona records. The point here is that we are now forced to deal with persona records regularly, though we have to look at them through the non-rose-colored glasses of the wrong paradigm.

There is another important point I'd like to make about the algorithms. I do talk about using them to automatically combine persona records into multi-tier person trees. However, there is no need for the algorithms to actually do the combining. The algorithms can also be used simply as expert assistants that can process all the records in a database and then make suggestions about 1) personas you have brought together but probably shouldn't have, and 2) personas that you haven't brought together but probably should. That is, the algorithms suggest relationships and changes, but you are in charge of making all the explicit changes.
gthorud 2011-06-17T16:44:02-07:00

There is a huge project – or rather a “want to be huge project” – that intends to create a database containing all persons that lived/lives in Norway from about 1800 to 1964 (the last 54 years will be for professional researchers only, incl medicine). The people that initiated the project have been working on “statistical record linking” for demographic purposes for 20-30? years, and it seems they intend to try those methods in the project. We will see in perhaps 10 years time how useful those algorithms are – for genealogy purposes I have my doubts – but if you accept 90% accuracy it may be ok. The problem as I see it is, if those methods work, where have all the fun in genealogy gone.

Here is a link to the project page – there are links to a couple of English articles on record linking there http://www.rhd.uit.no/nhdc/hpr.html

I must admit I have not read the pdf doc that Tom posted – my head is still working primarily on how to “lay?” tiles – but I see some use for these algorithms in terms of coming up with SUGGESTIONS – as Tom describes.

But, there are also problems depending on culture – here true Patronymics were widely used until about 1900, and abandoned by law as late as 1922. When halve the population uses one of the 4-5 most popular patronymics as surname, you will get a lot of false matches. I have tried to use the tools in genealogy programs that is supposed to find duplicate persons – the list of (false) duplicates was HUGE.
ttwetmore 2011-06-17T19:12:24-07:00
Geir,

Thanks for the info on the Norwegian project. It will be interesting to see if they can make it work. I hope Norwegian genealogy can still be fun if they are successful. It's interesting you said that about fun. I had a long talk with the chief software architect of Ancestry.com about the matching algorithms a year ago. He said the same thing -- they had thought about the idea of automating linking, but had decided against it because it would take the fun out of genealogy. Frankly I thought that this meant that Ancestry.com had tried to solve the problem but had failed, so he was trying to put a good face on their failure. But if you feel the same way, maybe he was right.

"Laying tiles" is the right way to say it!!

Yeah, the algorithms should start out as only providing suggestions. Only after some evaluation should we think about going further. My topic was "Matching Persons Algorithmically" rather than "Linking Personas Automatically!!"

You point out problems based on culture. Yes. When I did this work in the other application we found out that we could organize comparing different attributes of persons by using what we called "software experts." For example, we had a software expert of handling name comparisons. The experts provided a convenient way to create plug-ins as needed to handle dealing with special cases of different types of values, the kinds of differences that come from cultural differences. So we had a name expert, a date expert, a place expert, and so on. This provided a good separation between some of the cleanliness of the overall algorithmic flow, but with other software, that could be as messy as necessary, for handling different naming schemes, different dating schemes, and so forth.

The features that try to find duplicates in today's software are quite primitive compared to the probabilistic schemes that would have to be done to assure highly probable suggestions. In the case you mentioned, when half the population carry only 4 or 5 surnames, the probabilities might work out in such a way that much fewer linking suggestions could be at different probabilistic thresholds.
GeneJ 2011-07-10T21:34:37-07:00
"Entity "Citation PAF style" is meant to refer to the citation structure as encoded in GEDCOM, PAF, etc, etc."
Perhaps you could elaborate on what this is?

We have a Wiki page, "Sources and Citations in GEDCOM."
http://bettergedcom.wikispaces.com/Sources+and+Citations+in+GEDCOMA

To my knowledge, "GEDCOM" isn't a citation style, it's a protocol for transferring source and citation data between genealogical software programs. And, oOoo ... it DOESN'T WORK.
AdrianB38 2011-07-11T01:44:50-07:00
OK - at risk of my explanation confusing even more....

1. When I said "style", I didn't mean "style" as in Chicago Manual of Style - i.e., not style as in "use italics here and quotes there..."

I meant "in the manner of..." - so the phrase "Citation PAF style" is intended to mean "A citation dealt with in the manner that PAF and GEDCOM deal with it".

2. I also am talking about data entities here, not printed outputs. So an entity of "Citation", when I'm talking about PAF - or GEDCOM - refers to a chunk of data in the file or database in question, not a set of printed words in a report.

So, my entity "Citation PAF style" refers to a chunk of data in my data model. That chunk of data contains the same sort of data, coded up in the same manner, that GEDCOM and PAF do. Except - of course, it would have the extra bits to make it work properly.

3. The reason I write things like "Citation PAF style" or "Citation" in quotes is that people like the Ancestry Insider say that what the PAF (and by extension, GEDCOM) call a citation, isn't. I did ask him to explain that once and, paraphrasing (so any error and omission is my fault, not his), the gist of it seems to be that
(a) a citation should tell you where to find the original source of some information;
(b) what PAF label as a citation in their software doesn't do that - it simply points to a source record (possibly adding to it bits like page number or a "quality" marker). Only when you go to the source record in question, can you pick up sufficient further bits of data to stick everything together to make up the correct printed citation - or, in this case, reference note.

So - I can accept that strictly speaking, what PAF (and GEDCOM) call a citation, isn't, because it's only a pointer and (potentially) a bit of data.

BUT - if you draw up a meaningful data model that uses source-records, you simply have to have a chunk of data that looks like what PAF (and GEDCOM) call a citation - the chunk points to a source record; it adds in things like page number, quality and others things as well, perhaps; it does NOT contain the full stuff like the source's title, author and publication details, because they're stored against the source record. It doesn't contain them and it shouldn't contain them.

So it doesn't contain the data for a full citation, so some people don't want us to call it a citation but we do need it, so what do we call it? Well, in lieu of any better term (because source-pointer has been proposed and disliked as a bit too-IT) I'm going to carry on calling it a "citation" (in quotes, indicating it's a sort of citation) or citation-PAF-style.

OK - that's the full explanation - apologies if it serves to confuse further.
GeneJ 2011-07-11T07:23:30-07:00
We should retire (RIP, invite to the "remember when" archives and refer to in past tense) concepts of "PAF" citations or "GEDCOM" citations.

I'll try to add a page "Forms of Citation More About Citations" to the "About Citations" dialog.

I'm confident that even BetterGEDCOM can get this. --GJ

Well, I tortured the page name on first attempt. I'll put that content here.
http://bettergedcom.wikispaces.com/Forms+of+Citations
GeneJ 2011-07-14T20:20:09-07:00
Robert Raymond (FamilySearch), "Interoperable Citation Exchange 2009-03-11.pdf"

http://bit.ly/obUWGq
GeneJ 2011-07-14T21:00:10-07:00
P.S. Slide no. 21 refers to "Smallest-to-largest."

See also, EE (2007), p 118-19 for International Differences, "Tradi- tionally, citations in the United States have used a dual system that calls for one arrangement in reference notes and another in the Source List." ...

Mills goes on to recognize that in the US, reference notes citations "start with the smallest element in the citation and work up to the largest (the archive and its location)." In contrast, she also writes that internationally, reference notes typically "start with the largest and work down to the smallest."

See the referenced text for her further comments.
GeneJ 2011-08-06T12:02:32-07:00
FYI only.

Robert Raymond, "Citing Online Sources," wiki syllabus; _FamilySearch_ https://wiki.familysearch.org/en/Citing_Online_Sources: accessed 2 Aug 2011).

Opens with, "This is the syllabus for a class taught by Robert Raymond and represents his private opinion ..."
AdrianB38 2012-07-08T08:40:16-07:00
What constitutes proof?
Michael Hait's "Planting the Seeds" blog has a posting "Notable Genealogy Blog Posts, 8 July 2012" (https://michaelhait.wordpress.com/2012/07/08/notable-8-july-2012/) that directed me to Elizabeth Shown Mill's "Evidence Explained" website and her QuickLesson 8 "What Constitutes Proof?" (see https://www.evidenceexplained.com/content/quicklesson-8-what-constitutes-proof ).

Interesting to see that ESM says "Reliable proof has 11 basic building blocks" and lists them as:
1. Thorough research;
2. Thoughtful evaluation of the quality of each source;
3. Careful notetaking and documentation;
4. Unbiased appraisal of the informant for each piece of information;
5. Accurate interpretation of the information each source provides;
6. Knowledgeable placement of that information in relevant context;
7. Skilled correlation of the details yielded by all the records;
8. Creative milking of clues that point to new resources;
9. Critical analysis of the evidence drawn from the sources, individually and collectively;
10. Logical rebuttal of any and all evidence to the contrary;
11. A written summation of the evidence that supports our conclusion—not a list of sources but a well-reasoned explanation of why we believe the body of evidence justifies our conclusion. (end of quote)

I've no problem with any of that at the headline level. But 2 issues pop into my mind.

1. No mention of the Genealogical Proof Standard.
2. No guidance on how much proof is enough.

Re item 1 - it seems to me that if we're trying to persuade the world at large that it's not enough to cite an image of a census page for John Doe in Chicago, but you need also to show it's the right John Doe - which is what "proof" is all about - then a co-ordinated attack would be better. As it is, the casual reader is left unclear how ESM's 11 points relate to the GPS's 5. Yeah, that includes me too. Is she saying the GPS is inadequate? Actually, I doubt it, I think she's expanded the explanation of some points that probably needed expanding. I think. But the confusion is something we don't really want.

Re item 2 - how much is enough? I actually think the GPS has it better than ESM here when it says "Reasonably exhaustive search". ESM simply has "Thorough Research" and says "systematically, we seek out every relevant source". I can't agree with this. I could go into Chester Record Office and look at all the account books of all the estates in case I found something relevant to my relatives' occupations. But those sources are all paper based and I'd be there forever. And the risk of missing something is, I suggest, slim - note that risk is a function of likelihood AND of impact (it's quite _likely_ I'll end up missing employments of various farm labourers in my relatives but I probably already know they're labourers of a sort from the censuses - i.e. no material impact.). I reckon ESM is trying to get there with the word "relevant" but it needs working on.

Whereas the GPS and its "Reasonably exhaustive search", precisely because it doesn't define the term, makes you think about what's relevant and practical. Or at least, that's my take on it.

So - if BG is aiming to support the GPS, how do we approach ESM's 11 steps?
WesleyJohnston 2012-07-11T13:07:00-07:00
I have not read all the above posts, but it seems to me that the consensus is that "global data standard should not tie itself to a given procedural standard".

I also have a concern about the "reasonably exhaustive search" since any such search is bound to come up with many sources consulted with negative results, which normally are not represented in genealogical databases, so that attempting to support that really would push genealogical databases into a very new direction ... which most users really would not be interested in. It raises the issue of separate databases for documenting all searches (a research support database) and documenting positive results (what we now have as genealogical databases).

On another aspect of "proof", since I use Ancestry trees, with all their limitations, as my master databases, I have had to deal with the issue of explaining some extremely complex decisions about what I have chosen to represent in the trees. So I have used Ancestry's "story" feature that allows a document to be attached to one or more people, and I have written some very extensive research notes there in which I go through the steps of presenting the problem, presenting the evidence, weighing the evidence, and reaching the conclusions that I have reached -- which in some cases are clearly not "proof" since real "proof" simply cannot be had from the existing records that have been found. Some of these research notes, my name for them, are documents of many pages, quite extensive.

My point is that support for something like this free-form presentation of the steps of reaching the conclusions is essential, in a way that allows it to be tied to the specific people and events (and any other entities desired, e.g. places) involved in the evidence and the consideration of the evidence.
AdrianB38 2012-07-11T15:01:57-07:00
"something like this free-form presentation of the steps of reaching the conclusions is essential"
I think how the written proof is stored is perhaps up for debate. I do toy with the idea of providing a "proof" item with free-text sub-parts, perhaps categorised by the GPS headings as, despite it being one particular standard, nonetheless it seemed a reasonable set of headings to start with.

Pro this idea
- supports GPS (not _supports_ not drives);
- splitting down into (any) components is always going to help with a complex case;

Against this idea
- structure of proof is then inflexible;
- another process might require other proof sub-parts (see previous point);
- less flexibility in the written format of the proof if you print it out;
- altogether over the top if all we want to do is use the 1930 census to record that someone has a radio (note - the meat of that proof is in the proving that this is them in the census - that's a full GPS proof, the possession of a radio is simply a corollary to that);

Even if we have sub-parts, those sub-parts are going to be free-form text. That much is clear to me.

Wesley - re negative searches - I somehow have the idea that a large portion of my negative searches are repeats, simply because I never wrote down what I was looking for. There has to be a better way than relying on free text notes attached to... "oops - I forget where I put that note!" And I think that negative searches are not dissimilar from those searches where we prove that there is only one such person - except the number is different. Either way there's a need to record these, either in free-text if we can't think of better, or mildly split up, as part of a standard proof (e.g. "There is only Aloysius Doe born in Scotland in 1881" so the record in America of an Aloysius Doe born in Scotland in 1881 must refer to the same person.)

So somehow I feel there must be a better way than just free-texting it all. After all, we could free-text the lot in a word-processor but we don't.
AdrianB38 2012-07-11T15:03:16-07:00
D'oh
Pro this idea
- supports GPS (not _supports_ not drives);

should read
Pro this idea
- supports GPS (note _supports_, not drives);
Alex-Anders 2012-07-11T15:45:58-07:00
'I also have a concern about the "reasonably exhaustive search" since any such search is bound to come up with many sources consulted with negative results, which normally are not represented in genealogical databases, so that attempting to support that really would push genealogical databases into a very new direction ... which most users really would not be interested in. It raises the issue of separate databases for documenting all searches (a research support database) and documenting positive results (what we now have as genealogical databases).'

For an individual's purpose, a separate database might work, but for others, where should it be in the Proof result.

Surely, events etc that are excluded, after research should be included in the same outcome. These go to prove what is not correct and thus if recorded, assist others in also excluding them.
ttwetmore 2012-07-12T09:46:10-07:00
In my view the goal of BGEDCOM should be a data standard for expressing three layers of information:

1. The Sources of information.
2. The Evidence found in those sources.
3. The Conclusions derived from the evidence.

If you glance at the GEDCOMX project you will see that they have a source metadata model, a record model, and a conclusion model, in one-to-one correspondence with these layers.

The GenTech model includes an administrative model where a researcher can keep track of to do list items, and searches that are underway, and the results from those searches, and so on. There has been some discussion here that the BGEDCOM model should include this as a 4th layer. Here is where one would keep track of successful and failed searches in different records types.

There has also been discussion here about how much support BGEDCOM should give to writing professional reports. Clearly it is easy to add rich-text note capabilities at any point in the data model to allow an author to have free reign to compose all the material needed to produce a document.

Some of this discussion has revolved around the idea of a research note that would a rich-text note added to a source reference (the link between evidence and the source it comes from) in which an author could say anything they wish about the evidence and the source. Citation templates would allow these rich-text notes to show up as footnotes, bibiliographic entries, or in-line text.

And by adding rich-text notes to the conclusion objects, an author is free to compose anything they wish about why they came to whatever conclusions they came to.

I for one would much prefer we use these rich-text notes as the way to support GPS in BGEDCOM. I sure don't want there to be a reasonably exhaustive search object.
AdrianB38 2012-07-12T12:51:55-07:00
Tom - adding what is, in effect a richer-format, note item at various points is, it seems to me, an attractive means of "supporting" GPS. Not least because any attempt to break down the components of a proof argument and then stick them back together in software for printed output will probably result in a poorly constructed bit of writing.

If I follow this route in modelling the data, I feel the need to tweak your suggestion about the positioning of the rich-note containing the proof-argument. Rather than sit as an extension to the source reference (the link between evidence and the source it comes from), I would want it to bear the same relation to the "fact" that a source does now.

If we consider the relation between a "fact" (i.e. property / fact / attribute....) and a source-record, it's a many-to-many. I believe that the relation between a "fact" (i.e. property / fact / attribute....) and a proof-argument-note is also many-to-many.

For instance, I might want to record that X is the daughter of Y and that I have two reasons for believing this, i.e. two proof-argument-notes. Maybe an original, weaker one, and a later, more robust one.

The same proof-argument-note might also justify several "facts" - firstly that Thomas in the New York 1930 census (who is new to me) is the same as Thomas in the New York 1910 census (who I already have in my database) (the 'proof of identity') and, as a series of corollaries, that 'my' Thomas therefore lives at X in 1930, has an occupation of Y, access to a radio, etc. In other words, proof-argument-note to fact is one to many when viewed in that direction, as well as many to one.

Or put perhaps simpler - where in a current GEDCOM based program one sees a reference to a source-record justifying a fact, in this scheme one would see EITHER a source-record OR a proof-argument-note.

Of course, there is then the pain of wondering whether the proof-argument-note should contain full-fat references to source-records.
ttwetmore 2012-07-12T19:48:02-07:00
Adrian,

I agree with you. I would want the rich-text note that accompanies the source reference to only document the specific item of evidence being referred to. These are one of the things that have been called research notes. (Remember the arguments we have had about what a source reference [sometimes erroneously called a citation] should contain -- some have said it should only contain the information needed to locate the evidence in its source; while others have said it can contain more information, for instance a summary of the evidence, or a transcription or it, or a comment on its assumed accuracy, or on its assumed provenance, basically anything that the researcher believes is important and wants to call the reader's attention to. Where people have come down on this argument has some relationship to whether and how they believe evidence should find its way into our databases. No sense going down that path here though.)

For the proof statements themselves, as you say, they would be attached to the actual facts that make up the conclusion level persons. For example the birth event fact in a conclusion person could contain a rich-text proof note that describes all the sources of birth information that were discovered, and why the specific birth information chosen for this conclusion person was chosen. To my mind that statement is the proof statement of GPS.

For my models, I just go whole-hog and say that a note can be attached to anything anywhere.
AdrianB38 2012-07-13T08:32:13-07:00
"I agree with you". Oh good.

"the proof statements themselves ... would be attached to the actual facts". Curiously I didn't realise it, but that's what I was saying, in effect. And GEDCOM 5.5 can actually do exactly that, right now - at least for events and attributes, as the Event_Detail (which is used by other structures) can contain from none to many Note_Structures, _each_ of which may be either an in-line note or a link to a free-standing, level0, shareable note record. And that structure also allows Source_Citations currently.

So if GEDCOM 5.5 can actually do exactly that, right now, i.e. it can link from events etc. to note records containing proof statements, why aren't I using them now? Well, occasionally I do, but usually only with a link to a proof-statement that comes from the top level "inside" a person - i.e. a level1 link inside a level0 person. I don't do linking from inside an event inside a person, simply because while my software (FamilyHistorian) will do it (it should allow me to construct any GEDCOM-legal data), it only does it from a general purpose screen with no specific item names and the normal screen designed for input of a individual's data only shows a sub-set of the possible items. Clearly, the designer thought one note per event (or attribute) was sufficient for normal usage, whereas I'm talking about two notes (at least) - one to expand on the event's details (my conventional usage), the 2nd (or 3rd) to contain a link to a proof-statement note.

Persuading software designers to make the full array of links easily available might be feasible if we added to the Note_Structure a value indicating what type of Note was being inserted - e.g. a type of "Expansion" for additional details; a type of "Proof-Statement" for what we're talking about and I'm sure other types could arise. That way they'll see the justification for multiple notes.

In fact, something like this will probably be necessary if we want the Citation Templates to pick up linked Proof-Statement notes and print them as (say) End-Notes while linked Expansion notes would be printed inline with the actual values of the event for the individual.

Note - everywhere above that I've said "event", you should read the need to be able to do the same against attributes, individuals, families, relationships, etc, etc.

Adrian
ttwetmore 2012-07-13T10:55:07-07:00
Adrian,

Thanks. All very good points. I'm sure Louis will be happy with the conclusion that GEDCOM can already do it.

I would put a note under the person for an overall proof statement about the person as a whole, and/or notes inserted with specific attributes (e.g., name, birth, death, etc.) if I felt they deserved or needed their own proof statement. Which I think is where you are coming from also.

Yeah, GEDCOM can do it, but only if the software vendor allows the creating of the necessary links and fields.

So you and I think proof statements are as simple as a specific type of free-format notes that decorate conclusion objects at logical places, and that there is no need to complicate the model with additional data types. I wonder how many others will agree. It may be too simple for a few others!
AdrianB38 2012-07-13T12:00:28-07:00
"So you and I think proof statements are as simple as a specific type of free-format notes that decorate conclusion objects..."

Free-format notes, with accompanying optional citations, yes. (Why optional? Because you might write them informally but completely, in-line, rather than have them postponed to the end. E.g. if you write "The Parish Register of St. Mary's has these entries...", why do you need a Chicago style citation that says exactly the same thing?)

I'm slightly reluctant to agree with you, but, for the moment at least, I do. As I said above, "any attempt to break down the components of a proof argument and then stick them back together in software for printed output will probably result in a poorly constructed bit of writing. " And it's pointless to have a badly written proof argument.
louiskessler 2012-07-13T12:32:55-07:00
Tom said: "I'm sure Louis will be happy with the conclusion that GEDCOM can already do it."

I've seen lots of GEDCOMs where people use the notes attached to the source references to contain their proof and reasoning.

Unfortunately, GEDCOM carries it a bit too far.

It not only (1) attaches all the conclusions and evaluation of quality of the source (QUAY) to the source reference, but it (2) also attaches the source details and even the citation to the source reference.

The first part is good. But in the 2nd case, those source details as well as the citation, should be attached to the source itself. Otherwise, e.g. if you're using one paragraph in a book to qualify eight facts, then that source detail information is repeated eight times, and I that phenomena exists in the majority of GEDCOMs I've looked at.

In Behold, I look for identical source details and pull them out and add them as subentries to the source they are subservant to. Then I label them as 1-1, 1-2, 2-1, meaning Source 1, Detail 1; Source 1, Detail 2; Source 2, Detail 1, and include them in a Source Details section. The events then refer to the source detail, and the source detail refer back to all the events they help document. For illustration, see: http://www.beholdgenealogy.com/screen5.gif

I believe that is the way it should be done.

Louis
ACProctor 2012-07-16T03:08:44-07:00
Hi folks!

I've just finished a major revision to STEMMA, which I'm still using to represent my own data. I've been intending to take the draft specification to a proper V1 for a while but kept getting side-tracked.

The relevance here is that STEMMA has always pushed the subject of Structured Narrative very hard for family history data. As well as using a semantic mark-up to flag references to Persons, Places, Events, Citations, etc., its narrative architecture also supports general reference notes, and a way of representing "reasoning" and "conclusions" as distinct from evidence.

Some of the new uploaded content includes a Data Model section where the model is applied to a number of case studies. This is also relevant as it includes the use of narrative for E&C (echoing much of what has been said here), and multi-source Events that involve multiple Persons (which sounds relevant to Louis's last post).

Tony
ACProctor 2012-07-08T16:34:17-07:00
The disparity between the two supports my view that a global data standard should not tie itself to a given procedural standard Adrian -- although products and personal preference are welcome to.

I have always said that a new standard should not be designed around GPS or any similar standard. As long as it clearly differentiates evidence, reasoning, and conclusion then that is enough. Anything else will limit its adoption, fragment the community, and eventually curtail its liftspan.

Tony
GeneJ 2012-07-08T17:18:09-07:00
Oh my.

Now, I didn't interpret Ms. Mills post "What Constitutes proof" to be contradictory to the GPS at all, so I don't find Tony's "disparity."

The Genealogical Proof Standard (GPS) is published by the Board for certification of Genealogists (BCG). The GPS is a sort of super statement, but it is one of a whole series of standards published by the BCG. The BCG is a living body.

Mills wrote another article that folks might find interesting, "Mindmapping Records."
https://www.evidenceexplained.com/content/quicklesson-6-mindmapping-records

Tony wrote about designing a standard "around" the GPS. I think of the work on the model differently. Ala, the model should support how genealogists actually work, and the related information requirements. For me, BCG standards are relevant (not limiting) in that context.
ACProctor 2012-07-08T17:44:57-07:00
Disparity is not the same as contradictory Gene.

The difference, though, highlights the fact that no single procedure (or definition) for 'proof' is fundamental or unique.

It is highly likely that other parts of the world may devise and adopt procedural standards of their own. Even in the US, or UK, no two "genealogists actually work" the same way even now.

From that point of view, the actual procedue cannot be part of the data model design. The fundamental requirement is to differentiate the different types of data (e.g. evidence, reasoning, & conclusion) and link them together appropriately.

I am not criticising the BCG, or its GPS, but it is up to individuals how they work. A data standard built around one procedural standard could become straightjacket.

Tony
AdrianB38 2012-07-09T02:11:09-07:00
Gene
Just to be clear, I'm not saying that I _do_ find the ESM article and the GPS to be contradictory or in conflict. I _am_ saying that they are talking about the same thing in different ways and I simply haven't analysed the two enough to convince myself of the differences and similarities. For what it's worth, I suspect that the 2 views are fundamentally compatible, but can't yet be certain of that. Even on the question of how much searching is enough (issue 2 above), I suspect the two aren't seriously in conflict - but I do think that the GPS's "Reasonably exhaustive search" says it better than ESM.

Rather, for me, the major issue is that the presentational aspect of two disparate (though probably not contradictory) views of the same topic _will_ lead to confusion.
ttwetmore 2012-07-09T02:12:27-07:00
Tony,

You've captured it correctly in my opinion. ESM's long list of what constitutes proof is a wonderful list of guidelines for doing quality genealogical research. Following the list doesn't really constitute proof in any absolute sense, but following it certainly will get you as close to proof, in any given case, that it is possible to come. For all practical purposes her list encompasses the GPS -- if you follow the list you will meet the requirements of the GPS.

Your points that the data standard should not cater to a specific proof standard are important. Any model that contains a sufficient capability for storing sources (where you find stuff), evidence (the stuff you find) and conclusions (how and what you interpret the stuff to mean) will support any proof standard. The definitional statement that we saw in the GEDCOMX world, that its goal is simply to "support the GPS" is, in my opinion, a simplistic appeal to political correctness. That statement, and the definitional statement for BGEDCOM, should state their goals to be the creation of a data model that can store the information needed to support the needs of genealogical research.

Tom
AdrianB38 2012-07-09T02:44:20-07:00
Tony
I wholly agree with you that any "global data standard should not tie itself to a given procedural standard". But that actually alludes to my dissatisfaction with the GPS. (Oh dear, tilting at ESM and the GPS in two days... As Sir Humphrey Appleby would have said in "Yes Minister", "Very courageous, Minister, very courageous..")

The GPS, as it's written, is OK. Sort of. But as soon as I start analysing it (bad habit of mine) I can't decide if it's a process or a set of measures. If it were to be written in today's business world it would be written either as a process or as a series of measures on data. Compare it with standards for railway rolling stock - they prescribe measures on the size, weight, crash-worthiness, etc - not _how_ they are built. On the other hand, the signalling standards provide processes for _how_ signallers (dispatchers if you're on the other side of the Atlantic) should control trains. So which side is the GPS?

I would suggest that if the GPS were to be re-written in a form that emphasised the data and the measures on the data (e.g. if a "Soundly reasoned, coherently written conclusion" were to be broken down into its components of evidence, analysis, conclusions, etc) then the results would be generic enough to overcome Tony's concerns over being too wedded to a process. BUT such a rewriting would be non-trivial, probably overly abstract and potentially counter-productive if it replaced the current statement. I suspect I'd rather tweak it into a process with verb noun steps - e.g. "Reasonably exhaustive search" becomes "Carry out a reasonably exhaustive search" - that's more of a call-to-action, if you'll forgive what sounds close to management speak. Though I suspect the verb surely ought to be "search"!

Whichever way it's done (nudged to become a process or rewritten to become measures on data), it is crucial that, as Tony say, we understand the _data_, because that's what BG will support. And I've kind of been happy to carry on talking about the GPS on the understanding (a) that what we're really talking about is the GPS's data and (b) that the GPS is the only show in town for proof. But now we get ESM's 11 points, the question has to be asked - is the data to support ESM's 11, the same as the data to support GPS's 5?? Not sure...
AdrianB38 2012-07-09T02:55:27-07:00
I'd also agree with Tom. The only thing I'd like to add is that studying the GPS can help us decide what data to model in BG. For instance, the "Reasonably exhaustive search" bullet suggests to me that at some point in the proceedings a list of feasible potential sources (types of source?) should be produced (to act as a target list) and that therefore such a list could be modelled as part of BG. Indeed, a list of UNfeasible potential sources might also be useful - e.g. the 1940 census was UNfeasible 12 months ago but is now feasible. Which gives rise to an interesting thought - does stuff considered proven 12m ago now have to be considered not proven if the 1940 is relevant?
ACProctor 2012-07-09T03:06:53-07:00
I see what you're saying Adrian, although I haven't tried to analyse the GPS & ESM's list as deeply as yourself.

I am not really worried about the GPS, or ESM's list, being essentially procedural. As Tom says, they provide excellent guidelines for successful substantiated research, and so should be supported.

However, support for those guides is not the same as building a 'data model' around them. It may be a subtle point, although Tom got it when he commented that a good description of the data that correctly categorises the various parts will implicitly support GPS and any other genealogical research standard, now or in the future.

Reconciling GPS and ESM, which are different but not contradictory, is an issue I feel less strongly about.

Tony
AdrianB38 2012-07-09T04:21:00-07:00
Tony - I think we're in basic agreement, even over your subtle point. It's just that I am less confident of our ability to correctly categorise the various parts starting from nothing. It's possibly worth pointing out that the components of a proof in maths (where I start from and thus driving my thoughts) are nothing like the components of a "proof" in family history / genealogy - see? I even had to put quotes round the 2nd use of the word! Hence my concern that the data components comprising a proof might not be as self-evident as desirable and hence my desire to use GPS and / or the ESM process as inspiration - or consider it as a test of the model, if you like.

Adrian
ACProctor 2012-07-09T04:47:23-07:00
Thanks Adrian. My categorisation comes from my personal research experience - experience that doesn't use off-the-shelf products and which makes copious use of narrative.

I'm therefore open to alternative categorisations, and inputs from BCG and ESM :-)

My own background is mathematics and physics. Hence, I hear something specific when I hear the word "proof". As you say, absolute proof is only possible in the mathematical disciplines.

Tony